Sockets Programming

Computer science is based on knowledge representation. We may consider data in either content or form. Protocols deal with forms.

Under a given form, data processing is automated and predefines machine behavior. So we human developers should only focus on form rather than content.
Further, protocols are a category of forms for communication among separate machines.

So within this set of articles, we will give general definitions of the concepts that are used in protocols, next, we will introduce the software concept of sockets which are used by different protocols to exchange messages.

Finally, we will give a sample of java code examples which will help you to build your own network API

Network Protocol:

Network protocols are formal standards and policies comprised of rules, procedures and formats that define communication between two or more devices over a network. Network protocols govern the end-to-end processes of timely, secure and managed data or network communication.

Java Sockets under the hood

Summary Before we start coding:

  • An Endpoint is a combination of Port and IP address, so every TCP connection can be identified by a pair of Endpoints
  • Server waits listening to the socket (which is identified by a server local port and its IP address) for a client to make a connection request
  • Client tries to rendezvous with the server’s machine and port

 

After this brief introduction we will introduce some classes that implements the tools which will need to make two systems or more communicating data between each other.

Let’s see a little bit under the hood:

The java.net package provides the class “Socket” which implements one side of the bidirectional connection, by the way, one of the provided constructors of this class takes as input: the remote IP address and port. The class “Socket” is called also client socket, it sits on top of a platform-dependent implementation, hiding the details of any particular system from your Java program. By using the java.net.Socket class instead of relying on native code, your Java programs can communicate over the network in a platform-independent fashion.

Additionally, java.net includes the ServerSocket class, which implements the server-side socket, which servers can use to listen for and accept connections to clients. Later on, we will see how to use the Socket and ServerSocket classes in sample code.

There are some other classes which provide other features, like connecting to a remote web server, the related classes are (URLConnection, URLEncoder), these classes are probably more appropriate than the socket classes. In fact, URLs are a relatively high-level connection to the Web and use sockets as part of the underlying implementation, in the next chapters we will see how to deal with those classes.

Introduction to sockets

Java uses the same concept as UNIX I/O (input/output):

User Process towards an object or device, it opens (read or write data from/to it) à closes it, to inform the system that this object or device is not used anymore.

When facilities for InterProcess Communication (IPC) and networking were added to Unix, the idea was to make the interface to IPC looks similar to file I/O. In Unix, a process has a set of I/O descriptors that one reads from and writes to. These descriptors may refer to files, devices, or communication channels (sockets). The lifetime of a descriptor is made up of three phases: creation (open socket), reading and writing (receive and send to socket), and destruction (close socket).

 

What you should keep in mind:

A socket is one endpoint of a two-way communication (bidirectional communication) link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.

 

Example

Normally, a server runs on a specific computer and has a socket that is bound to a specific port number. The server just waits, listening to the socket for a client to make a connection request.

 

On the client-side: The client knows the hostname of the machine on which the server is running and the port number on which the server is listening. To make a connection request, the client tries to rendezvous with the server on the server's machine and port. The client also needs to identify itself to the server so it binds to a local port number that it will use during this connection. This is usually assigned by the system.

 

If everything goes well, the server accepts the connection. Upon acceptance, the server gets a new socket bound to the same local port and also has its remote endpoint set to the address and port of the client. It needs a new socket so that it can continue to listen to the original socket for connection requests while tending to the needs of the connected client.

                                                                                               

 

On the client side, if the connection is accepted, a socket is successfully created and the client can use the socket to communicate with the server. And henceforth, the client and server can now communicate by writing to or reading from their sockets.

References:

https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html

 

TCP/IP and UDP/IP communications

There are two communication protocols that one can use for socket programming: datagram communication and stream communication

 

Datagram communication (UDP):

The datagram communication protocol, known as UDP (user datagram protocol), is a connectionless protocol, meaning that each time you send datagrams, you also need to send the local socket descriptor and the receiving socket's address. As you can tell, additional data must be sent each time a communication is made. In other words, a datagram is an independent, self-contained message sent over the network whose arrival, arrival time, and content are not guaranteed.”

 

Stream Communication or TCP (Transmission Control Protocol):

The stream communication protocol is known as TCP (transmission control protocol). Unlike UDP, TCP is a connection-oriented protocol.

To do communication over the TCP protocol, a connection must first be established between the pair of sockets. While one of the sockets listens for a connection request (server), the other asks for a connection (client). Once two sockets have been connected, they can be used to transfer data in both (or either one of the) directions.

Now, you might ask what protocol you should use -- UDP or TCP? This depends on the client/server application you are writing. The following discussion shows the differences between the UDP and TCP protocols; this might help you decide which protocol you should use.

In UDP, as you have read above, every time you send a datagram, you should send the local descriptor and the socket address of the receiving socket along with it. Since TCP is a connection-oriented protocol, on the other hand, a connection must be established before communications between the pair of sockets start. So, there is a connection setup time in TCP.

In UDP, there is a size limit of 64 kilobytes on datagrams you can send to a specified location, while in TCP there is no limit. Once a connection is established, the pair of sockets behaves like streams: All available data are read immediately in the same order in which they are received.

UDP is an unreliable protocol -- there is no guarantee that the datagrams you have sent will be received in the same order by the receiving socket. On the other hand, TCP is a reliable protocol; it is guaranteed that the packets you send will be received in the order in which they were sent.

In short, TCP is useful for implementing network services -- such as remote login (rlogin, telnet) and file transfer (FTP) -- which require data of indefinite length to be transferred. UDP is less complex and incurs fewer overheads. It is often used in implementing client/server applications in distributed systems built over local area networks.