-
Sockets programming
- In this week, we will cover the following topics:
- Sockets programming
- Classic synchronous TCP stream sockets in C on Linux
- Streams
- Writing TCP servers
- Writing TCP clients
- Sample echo-socket program
- The Message Boundary Problem and its solutions
- Sample message-socket-client and message-socket-server programs
- Byte Ordering
-
9.1 Sockets programming
9.1.1 Basic concepts
Sockets programming involves writing two programs – a server and a client. Communication between them is achieved using the sockets programming interface (provided by all modern operating systems); it provides low-level methods that a local process calls to communicate with a remote process over TCP/UDP or IP.
- Sockets programming is a big topic. We are going to focus on:
- Synchronous stream sockets, i.e. using TCP, coded the traditional way with read() and write(). This is what most people use (note the similarity to file I/O). The transmission property is full-duplex – i.e. both ways at the same time are possible, such that read() and write() will be in both the client and the server.
- We will then consider what is wrong with this approach and how to fix some of the problems.
Streams: sockets are accessed through a stream abstraction: even though a message is split into segments and packets on the wire, programs see the message as a stream of bytes – the API hides the packets. This idea is also used in file-systems; programming with sockets can be quite like programming file access (read() and write(), but with sockets rather than file descriptors).
- Before writing a network application, decide which protocol to be used (TCP, UDP, or IP) as this determines the use of specific types of sockets:
- Stream socket: To be used with a connection-oriented transport protocol i.e. TCP.
- Datagram socket: To be used with a connectionless transport protocol i.e. UDP (we will not consider).
- Raw socket: Directly use services of IP (we will not consider).
-
9.1.2 What is a Socket?
You need a socket in both the client and the server. Programmatically a socket is a data-structure which binds together information to facilitate specific communication over one of the Internet protocols. The socket includes a unique id, details about the protocol (whether it is TCP or UDP etc.), whether it is set to be currently listening for connections (its state) and what the socket address of the host is. A client will typically have 1 socket whereas a server will have 1 listening socket and an additional socket for each established client connection.
Endpoints and Sockets: When programming a server or a client, we must bind (i.e. associate) the socket to a socket address (also known as an endpoint). An endpoint is a combination of IP address and port number, e.g:
-
9.1.3 Example - echo-socket example (Lab 8)
- an important thing that must be decided when writing a network program is your own application protocol - the rules that governs communication between the server and the client in your own code
- in this example, the server will communicate with a client under a protocol summarised in the sequence diagram below
- uses synchronous TCP sockets
- Example - echo-socket (Lab 8):
- Uses port 50001
- In the code read() and write() will exist in pairs.
- remember both read() and write() will be in both the client and the server program
- each Send() in the sequence diagram should be interpreted as a read()/write() pair
- with each write() at the sender there will be a corresponding read() at the receiver but this is not shown in the sequence diagram
- you have to be careful to match read() and write() up across the sender and receiver
- A good approach is to write a single interaction first, test it and incrementally add more.
- Obviously you have to make sure the server is running before running the client...
- Coding with sockets is very much a pattern.
- don’t deviate from what you know works!
- When a client host wishes to make a connection, TCP uses a socket to send out a request message to the server machine defined in the socket address. The message is then passed to the IP layer, which assembles an IP packet for transmission to the destination.
- use socket() to create a socket of correct type
- create a socket address using sockaddr_in
- use connect() to attempt a connection
-
9.1.4 Using Stream Sockets
- When the server receives the connection request, it returns a message containing its own unique id and socket address. This handshake identifies a virtual connection between the two processes.
- use socket() to create a socket of correct type
- create a socket address using sockaddr_in and this time listen on all network cards (network interfaces)
- use bind() to associate socket address with socket and create the endpoint
- use listen() to put server into listening state
- use accept() to wait for connections
- this blocks and returns when a client connects
- accept() returns a new socket specific to the client
- After the connection has been made the data can flow between the two hosts (called a data stream).
- using read() and write() as appropriate
- you need to code to coordinate these in client and server
- note read() and write() should pass unsigned char and to send anything i.e. text or binary data
So you have to make sure you can construct unsigned char from what you want to send (and reconstruct the original data-type back again), which is called serialisation/de-serialisation
TCP receivers send back acknowledgements for data that has been successfully received. The receiver can tell if there are any lost packets or whether the remote host has crashed and has been restarted; make sure the appropriate code is called to close sockets properly and free resources (see listings). The functions read() and write() both block until the data has been sent/received; these are synchronous sockets; there are also non-blocking asynchronous sockets; not considered in this module.
-
9.1.5 Sockets – Message Boundary Problem
Using basic sockets is attractive as it is almost universally standard across most languages and OSs and is well understood. However there are a couple of problems with both raw read() and write(). Notice they both return an integer value indicating how much was actually sent and received (make sure you read the comments about read() in echo-socket client.c).
Consider, for example, that you attempt to send 1KB of data. The write() function returns 1024, but the read() function in the other process returns, say, 512. You have to keep calling read() until you get a total of 1024 bytes and reassemble the data yourself. This will not happen if the processes are on the same host but if you try and run this across a network it may (or may not) happen.
Luckily reassembling the data for a given size is fairly straightforward; Kerrisk provides an implementation of readn() and writen(), which we can use to make sure n bytes are fully read/transmitted; readn() calls read() multiple times until a specified n bytes is read and reassembles the read data.
- see Lab 8 message-socket-server (and client)
- program – rdwrn.c and rdwrn.h
- this is implemented in a separate code module (like hexdump() previously with the same Makefile).
- However there remains a second problem with this: how does the receiving process know the value of n? There are three solutions:
- send fixed length messages (poor)
- send messages delimited by an agreed character
- not good for binary data as you want to be able to send anything (data transparency) but fine for text
- send 2 part messages
- a header with the length of the payload data followed by the payload data itself
- the header is a known fixed length (say 4 bytes)
- It is best to send 2 part messages so the payload can be of arbitrary length, which we will do:
- Lab 8 message-socket-client
- Lab 8 message-socket-server
We still have to make sure that the length of the payload will be transmitted first, but that length will be a known fixed 4 bytes, which shouldalso use readn() and writen(); there is no guarantee you will receive even 4 bytes in the one go!
-
9.1.6 Byte Ordering – be aware!
Another issue with sockets is the byte order that makes up a word (remember integers are 4 bytes) differs between architectures, for example little-endian and big-endian, so if this differs between the client and server the wrong information will be interpreted unless care is taken! Not a problem for us as our C clients and server are all on Linux and are all on the same processor.
Sockets define host byte order and network byte order.
see:- 🔗 http://www.developer.com/net/cplus/article.php/3329251/Sockets-Byte-Ordering-Primer.htm
-
9.1.7 Threads
- We will improve our program after the next lecture by introducing threads, allowing multiple clients to connect to a server simultaneously. This gives a more elegant solution but nearly all the code introduced today can be reused with minor code changes. Therefore the coursework can be started now and threads incorporated later on in the development.
- using Lab 8 message-socket-server (and client)
10.1.8 Prerequisites for next time
- Reviewed code from Lab 8 (sockets directory):
- simple TCP synchronous sockets (echo-socket)
- solution to message boundary problem (message-socket-server and message-socket-client)