Week 9 - Socket Programming

Sockets programming

In this week, we will cover the following topics:

Sockets programming

Classic synchronous TCP stream sockets in C on Linux

Streams

Writing TCP servers

Writing TCP clients

Sample echo-socket program

The Message Boundary Problem and its solutions

Sample message-socket-client and message-socket-server programs

Byte Ordering
9.1 Sockets programming

9.1.1 Basic concepts

Sockets programming involves writing two programs – a server and a client. Communication between them is achieved using the sockets programming interface (provided by all modern operating systems); it provides low-level methods that a local process calls to communicate with a remote process over TCP/UDP or IP.

Sockets programming is a big topic. We are going to focus on:

Synchronous stream sockets, i.e. using TCP, coded the traditional way with read() and write(). This is what most people use (note the similarity to file I/O). The transmission property is full-duplex – i.e. both ways at the same time are possible, such that read() and write() will be in both the client and the server.

We will then consider what is wrong with this approach and how to fix some of the problems.

Streams: sockets are accessed through a stream abstraction: even though a message is split into segments and packets on the wire, programs see the message as a stream of bytes – the API hides the packets. This idea is also used in file-systems; programming with sockets can be quite like programming file access (read() and write(), but with sockets rather than file descriptors).

Before writing a network application, decide which protocol to be used (TCP, UDP, or IP) as this determines the use of specific types of sockets:

Stream socket: To be used with a connection-oriented transport protocol i.e. TCP.

Datagram socket: To be used with a connectionless transport protocol i.e. UDP (we will not consider).

Raw socket: Directly use services of IP (we will not consider).
9.1.2 What is a Socket?

You need a socket in both the client and the server. Programmatically a socket is a data-structure which binds together information to facilitate specific communication over one of the Internet protocols. The socket includes a unique id, details about the protocol (whether it is TCP or UDP etc.), whether it is set to be currently listening for connections (its state) and what the socket address of the host is. A client will typically have 1 socket whereas a server will have 1 listening socket and an additional socket for each established client connection.

Endpoints and Sockets: When programming a server or a client, we must bind (i.e. associate) the socket to a socket address (also known as an endpoint). An endpoint is a combination of IP address and port number, e.g:
9.1.3 Example - echo-socket example (Lab 8)

an important thing that must be decided when writing a network program is your own application protocol - the rules that governs communication between the server and the client in your own code

in this example, the server will communicate with a client under a protocol summarised in the sequence diagram below

uses synchronous TCP sockets

Example - echo-socket (Lab 8):

Uses port 50001

In the code read() and write() will exist in pairs.

remember both read() and write() will be in both the client and the server program

each Send() in the sequence diagram should be interpreted as a read()/write() pair

with each write() at the sender there will be a corresponding read() at the receiver but this is not shown in the sequence diagram

you have to be careful to match read() and write() up across the sender and receiver

A good approach is to write a single interaction first, test it and incrementally add more.

Obviously you have to make sure the server is running before running the client...

Coding with sockets is very much a pattern.

don’t deviate from what you know works!

When a client host wishes to make a connection, TCP uses a socket to send out a request message to the server machine defined in the socket address. The message is then passed to the IP layer, which assembles an IP packet for transmission to the destination.

use socket() to create a socket of correct type

create a socket address using sockaddr_in

use connect() to attempt a connection
9.1.4 Using Stream Sockets

When the server receives the connection request, it returns a message containing its own unique id and socket address. This handshake identifies a virtual connection between the two processes.

use socket() to create a socket of correct type

create a socket address using sockaddr_in and this time listen on all network cards (network interfaces)

use bind() to associate socket address with socket and create the endpoint

use listen() to put server into listening state

use accept() to wait for connections

this blocks and returns when a client connects

accept() returns a new socket specific to the client

After the connection has been made the data can flow between the two hosts (called a data stream).

using read() and write() as appropriate

you need to code to coordinate these in client and server

note read() and write() should pass unsigned char and to send anything i.e. text or binary data

So you have to make sure you can construct unsigned char from what you want to send (and reconstruct the original data-type back again), which is called serialisation/de-serialisation

TCP receivers send back acknowledgements for data that has been successfully received. The receiver can tell if there are any lost packets or whether the remote host has crashed and has been restarted; make sure the appropriate code is called to close sockets properly and free resources (see listings). The functions read() and write() both block until the data has been sent/received; these are synchronous sockets; there are also non-blocking asynchronous sockets; not considered in this module.
9.1.5 Sockets – Message Boundary Problem

Using basic sockets is attractive as it is almost universally standard across most languages and OSs and is well understood. However there are a couple of problems with both raw read() and write(). Notice they both return an integer value indicating how much was actually sent and received (make sure you read the comments about read() in echo-socket client.c).

Consider, for example, that you attempt to send 1KB of data. The write() function returns 1024, but the read() function in the other process returns, say, 512. You have to keep calling read() until you get a total of 1024 bytes and reassemble the data yourself. This will not happen if the processes are on the same host but if you try and run this across a network it may (or may not) happen.

Luckily reassembling the data for a given size is fairly straightforward; Kerrisk provides an implementation of readn() and writen(), which we can use to make sure n bytes are fully read/transmitted; readn() calls read() multiple times until a specified n bytes is read and reassembles the read data.

see Lab 8 message-socket-server (and client)

program – rdwrn.c and rdwrn.h

this is implemented in a separate code module (like hexdump() previously with the same Makefile).

However there remains a second problem with this: how does the receiving process know the value of n? There are three solutions:

send fixed length messages (poor)

send messages delimited by an agreed character

not good for binary data as you want to be able to send anything (data transparency) but fine for text

send 2 part messages

a header with the length of the payload data followed by the payload data itself

the header is a known fixed length (say 4 bytes)

It is best to send 2 part messages so the payload can be of arbitrary length, which we will do:

Lab 8 message-socket-client

Lab 8 message-socket-server

We still have to make sure that the length of the payload will be transmitted first, but that length will be a known fixed 4 bytes, which shouldalso use readn() and writen(); there is no guarantee you will receive even 4 bytes in the one go!
9.1.6 Byte Ordering – be aware!

Another issue with sockets is the byte order that makes up a word (remember integers are 4 bytes) differs between architectures, for example little-endian and big-endian, so if this differs between the client and server the wrong information will be interpreted unless care is taken! Not a problem for us as our C clients and server are all on Linux and are all on the same processor.

Sockets define host byte order and network byte order.

see:- 🔗 http://www.developer.com/net/cplus/article.php/3329251/Sockets-Byte-Ordering-Primer.htm
9.1.7 Threads

We will improve our program after the next lecture by introducing threads, allowing multiple clients to connect to a server simultaneously. This gives a more elegant solution but nearly all the code introduced today can be reused with minor code changes. Therefore the coursework can be started now and threads incorporated later on in the development.

using Lab 8 message-socket-server (and client)

10.1.8 Prerequisites for next time

Reviewed code from Lab 8 (sockets directory):

simple TCP synchronous sockets (echo-socket)

solution to message boundary problem (message-socket-server and message-socket-client)

Sockets programming

9.1 Sockets programming

9.1.1 Basic concepts

9.1.2 What is a Socket?

9.1.3 Example - echo-socket example (Lab 8)

9.1.4 Using Stream Sockets

9.1.5 Sockets – Message Boundary Problem

9.1.6 Byte Ordering – be aware!

9.1.7 Threads

10.1.8 Prerequisites for next time