08
  • 8.1 Internet protocols

    8.1.1 What is the Internet?

    A global heterogeneous network that connects a collection of computers all over the world, using transmission media (copper, fibre, wireless, etc.), special purpose devices (routers, gateways, switches, etc.), network operating systems (NOS) and applications software (email, web browsers, etc). The goal is to provide connectivity between machines and between users to:
    share resources
    increase reliability and availability
    collaborate (email, distributed computing, etc.)
    access remote information

    Thus, the Internet is a vehicle for transferring data from one host (machine) to another. A host will have one or more network interfaces, i.e. network cards (or virtualised versions if it is VMWare etc.) most likely network technology you will see is switched Ethernet.

  • 8.1.2 Hierarchical Structure of nodes

    Internet service providers (ISPs) are roughly structured in a hierarchical manner.

    Figure 1: Hierarchical structure of internet nodes
    Figure 1: Hierarchical structure of internet nodes

    At the lowest level are an organisation’s networks, e.g. the network of Glasgow Caledonian University. These local networks are sometimes called subnets, which are themselves usually split further into more subnets by network administrators to localise traffic and facilitate administration. However, the principles of operation are the same.

    Routers and gateways are computer networking devices that forward data packets (see bolow) between networks (subnets) toward their destinations; a router if the 2 networks use same network technology; a gateway if they do not. A router/gateway contains a routing table containing information on where to route the packet next across the Internet.

    A switch (or hub) is a lower level device which routes packets between hosts within a network. A switched ethernet is an example of a network technology using switches.

    Therefore, these various nodes (routers, gateways etc) facilitate the movement of information ‘packages’.

  • 8.1.3 Packet Switching at nodes

    It is normal practice in computer networking to split a message into packets (i.e. equal sized pieces) when transmitting and then to reassemble them at the receiver into the original message, allowing:
    the memory buffering needs of equipment to be specified
    the independent routing of different packets
    only part of the message to be retransmitted if a packet is found to be absent or corrupted

    In Figure 2 we see an example of how packet switching works:
    source host generates a message and converts it to packets
    packets transferred independently across network
    destination router delivers packets to the destination host
    destination host rearranges received packets to retrieve submitted message

    Figure 2: Packet switching.
    Figure 2: Packet switching.
  • 8.1.4 Internet Applications

    An Internet application is a distributed system in which computations are performed by separate programs, normally running on separate pieces of hardware, that cooperate to perform the task of the system as a whole. Examples include:
    Electronic mail (e-mail)
    The World Wide Web (WWW) which uses HTTP as its protocol
    File transfer (FTP)
    Remote login such as Telnet and SSH
    Newsgroups
    Internet phone (VoIP)
    Real-time video conferencing
    Streaming audio and video
    Multi-user networked games
    Instant messaging
    P2P file sharing

    Internet application architectures are typically organised according to two common approaches:
    Client/Server model (C/S)
    one piece of the application acts as a server and another piece acts as a client
    the server program starts first and provides some service for clients that connect to it using a communication channel
    the client program requests services from the server; several clients can communicate with the server at the same time
    The sockets programs we will look at are simple examples of the client/server architecture.
    Peer-to-peer model (P2P)
    a system in which each program can act as both a client and a server for all the other programs
    each peer instance offers the same functionality

    Example Client / Server Interaction
    The server starts running.
    The server waits for clients to connect (listening).
    Clients start running and perform various operations, some of which require connection to the server to request a service.
    When a client attempts to connect, the server accepts the connection if it is willing.
    The server waits for messages to arrive from connected clients.
    Then the server takes some action in response and, typically, sends a message back to the client.
    Clients and servers continue functioning in this manner until one of them decides to shut down.

    Example Client / Server Interaction Diagram
  • 8.1.5 Communication Protocols

    Communication protocols are:

    “A set of specifications including formats, timing and rules that govern the functional operations of a telecommunications system in order to guarantee accurate and reliable transmission of data between stations on a network.”

    Communication protocols are implemented in both hardware and software, which is a a very large and complex problem. Different protocols used to be written by each vendor for each application. This led to standardisation problems and made inter-communication between vendors difficult. Need standards, need to split the problem up.

    8.1.6 Layered Protocols

    The basic idea of a layered protocol is to split a previously unmanageable problem into manageable pieces by using a layered system. We are trying to make processes communicate sensibly with other processes running on networked computers with different architectures and different operating systems - we must identify how the various problems could be isolated from one another. Each layer only interacts with those directly above and below therefore interfaces between layers have to be accurately specified, well defined and unambiguous. One wants to be able to alter a layer without altering its interface specification and not to have to change any of the other layers. Different types of errors will be detected and corrected at each layer.

  • 8.1.7 Layered protocols

    5-layered protocol suite

    5. Application Layer: provides application programs e.g. file transfer, web access and email. Additionally provides APIs to write these types of application e.g. an API for implementing email type applications or web-based functionality.

    4. Transport Layer: network independent interface to application layer i.e. (if connection-oriented) provides a data-pipe: messages are split into equal sized segments which go in and come out undamaged in the correct order; routing of messages to processes. Segments are transmitted.

    3. Network Layer: control of transmission through whole Internet from sending to receiving host: packet switching and internetworking. Packets are transmitted.

    2. Link Layer: software control of point-to-point transmission; purpose is to provide an error free channel. Frames are transmitted.

    1. Physical Layer: electrical, optical and physical definitions; signal definitions; host/network connection characteristics. Bits are transmitted.

    Protocol Suite

    The protocol Stacks
    Actual data transmission is vertical (down left hand side of diagram and up right hand side of diagram)
    Although actual data transmission is vertical each layer is programmed as if it was horizontal: e.g. transport layer of sender “talks to” transport layer of receiver.
    Headers (TH etc.) are added at sender side to data through each layer until actual bits transmitted by physical layer; note (TH + transport layer data) becomes data of network layer and so on.
    At receiver headers are progressively stripped off to get original data back.
    The Link Layer checksum is error checking mechanism.
    Protocol Stack
  • 8.1.8 Connectionless and Connection Oriented Protocols

    Network and transport layer protocols are often discussed in the following terms. Connectionless: the data is sent as a one-off packet, a datagram. The datagram is sent using a best-effort approach i.e. there is no guarantee that it will be delivered; if there are any errors it is simply discarded. No connection is established between sender and receiver so such a protocol runs with low overhead e.g. a database query. This approach exists because it runs faster as no connection is established. Connection-oriented: a connection is established between sender and receiver for the duration of the message. Robust error correction data transport is implemented; the message may consist of multiple segments/packets which are re-assembled into their correct order at receiver e.g. used for file transfer.

    Protocol Suite

    TCP, UDP and IP are the most common protocols in the suite, but be aware there are a few others...

    Protocol Suite
  • 8.1.9 IP Addresses and Port Numbers

    Every host on the Internet has a unique IP address, which is used to identify specific host on the Internet. Port numbers allow different communication sessions on the one computer (same IP address) to be differentiated. The port number completes destination address for a communication session; in other words used to identify specific processes running on a host. Port numbers are added by TCP and UDP.

    IP Addresses: Every node (host or router) on the Internet is identified using a unique IP address, which have a given length:
    32 bits in length (IPv4)
    usually written in dotted decimal format (each 8 bits are separated by a dot).
    128 bits in length (IPv6)

    To ensure that no two machines are mistakenly using the same IP, the allocation of IP addresses is done by InterNIC.

    Loopback Address: This is a special IP address 127.0.0.1 or you can use the DNS name “localhost”. This is a virtual network interface. Data packets sent to the network interface at this IP address are immediately returned (they do not leave the sending host).So if you want to write clients and servers which communicate and are on the same host use the loopback address. This is generally used for initial testing during software development and will will work regardless of the real IP address of the host.

    DHCP (Dynamic Host Configuration Protocol): The IP network settings can be set up manually or automatically on a host. This includes dynamic IP address, router address, DNS server and is setup as part of booting process. The DHCP sets up dynamic IP addresses, and network settings, automatically; i.e., a dynamic IP address is one that chosen for the host by the DHCP server when the host boots, and therefore can change. When settings are determined manually, it is often the case that this applies to servers, which use static IP addresses so they always have same IP addresses.

    Using ifconfig : ifconfig is a command line program that comes with Linux. It can be used to display IP network information for each active network interface on the current system. It has many options, but the most commonly used is:

    ifconfig –a

    …which displays detailed IP network information about the host on which it is run including IP addresses of all network interfaces, router, DNS servers...

    Alternatively for the host's own IP address:

    hostname –I

    on Windows use:

    ipconfig /all

  • 8.1.10 Protocols

    Internet Protocol (TCP)
    Network Layer
    Unreliable and connectionless.
    Data transferred in IP packets.
    sometimes also called datagrams (but I will not to avoid confusion with UDP)
    Concerned primarily with Internet routing i.e. routing between subnets; 64KB: so if data <= 64KB, one packet sent; if data > 64KB multiple packets are sent
    Uses IP addresses.

    Transmission Control Protocol (TCP)
    Transport Layer
    Splits message into a sequence of data segments.
    each data segment forms the data part of an IP packet
    TCP reconstructs received segments into correct order.
    requests retransmission of lost and corrupt segments
    Stream abstraction (see next lecture)
    Flow control:-
    the receiver can tell the transmitter that it cannot receive any more data at present. This typically happens when the receiver has filled-up its receiving buffer.
    Full-duplex (both ways at the same time), reliable and connection-oriented as participants establish connection before transmitting data.
    Adds port number.
    HTTP, FTP and SMTP use TCP.

    User Datagram Protocol (UDP)
    Transport Layer
    UDP datagram adds port number to a single IP packet.
    Datagram max. 64KB minus overheads
    Connectionless so runs with low overhead.
    nobody cares if datagram doesn’t arrive...
    typically sender tries again after timeout
    error detection i.e. if received datagram deemed corrupt just discarded – as if it never arrived
    DNS uses UDP.
  • 8.1.11 TCP and UDP Port Numbers

    Represented in a 16 bit positive integer (0 to 65535).

    Well-known ports
    0 to 1023
    reserved for specific server applications
    e.g. HTTP – port 80 or FTP – port 21
    do not use ports in this range (unless you are implementing these specific application layer protocols)!
    Registered ports
    1024 to 49151
    reserved for applications using specific registered ports
    e.g. Call of Duty – 28960
    best not to use in case you have applications running using these ports
    Dynamic ports
    49152 to 65535
    you can use – pick what you want...

    TCP and UDP Summary:

    UDP
    Stands for User Datagram Protocol.
    Connectionless (i.e. no connection is established).
    Data is delivered in a datagram (a single packet).
    Datagrams are routed over the network until they reach their target.
    Unreliable: no guarantee of delivery; datagrams can be lost and also datagrams received corrupted are just discarded.
    Advantage: faster; thus more suitable for real-time applications.
    Disadvantage: limited message length; lose benefits of robust communication and unlimited message length of TCP.
    TCP
    Stands for Transmission Control Protocol.
    Connection oriented (i.e. a connection is first established before data exchange begins).
    Data bytes are delivered as streams (in sequence).
    Data bytes are split into multiple segments which are routed over the network until they reach their target.
    Provides error and flow control to ensure packets reach target reliably (i.e. in order sent without errors).
    Connection is terminated once one of the communicating devices requests it.
    Disadvantage: slower especially if the network generates errors forcing retransmission.
  • 8.1.12 Sockets

    Sockets are the basic API for writing networked applications:
    Stream sockets (TCP) – Transport Layer
    Datagram sockets (UDP) – Transport Layer
    Raw sockets (IP) – Network Layer

    Application layer protocols also have their own API libraries (themselves implemented using sockets). The sockets APIs are fairly standard across languages and OSs. We will limit out discussions to stream sockets (i.e. TCP sockets) in this module. A typical stream sockets application behaves like the Client/Server Interaction example shown above.

    Application Layer: Looking back at our “Protocol Stacks” diagram you should note in the programs we will write the application layer is omitted. We are coding TCP sockets at the transport layer. In fact, you are providing the application layer in your application, which is a perfectly OK thing to do! A good example of an application layer protocol is HTTP. HTTP transports web requests and adds HTTP headers. An HTTP API is often available which a programmer can use to create HTTP requests. This adds HTTP headers automatically and uses TCP sockets underneath.

    Domain Name System (DNS): DNS provides a means of mapping between a numeric IP address (which is hard to remember) and a hierarchical string (which is easier to remember). For example, would you rather try to remember www.google.com or 216.239.39.100?

    DNS Resolvers: A resolver uses name resolution to query DNS name servers. Operation is transparent to applications. This manifests itself in two ways – by an API call (included in sockets libraries) and by the nslookup command line tool provided in most modern operating systems.

    try:

    <nslookup www.ebay.co.uk>

    “non-authoritative” indicates that this information is obtained from the cache of the local DNS server rather than the one responsible for the mapping

  • 8.1.13 Coding with the Internet Protocols

    Internet Protocols
    As software developers remember:
    the transport layer is the lowest layer we usually implement applications with i.e. TCP and UDP sockets
    the application layer protocols are themselves implemented with sockets – these too have their own class libraries (or API libraries) to write higher level applications
    the higher the level of the software layer, the easier it is to write applications but you lose expressive power – this is a tradeoff