-
Internet protocols
- This week the topics we will cover include:
- The Internet
- Client/Server and Peer-to-peer
- Communication Protocols
- Layered Protocols and Protocol Stacks
- The Internet Protocol Suite
- TCP, UDP and IP
- IP Addresses and Port Numbers
- DNS
-
8.1 Internet protocols
8.1.1 What is the Internet?
- A global heterogeneous network that connects a collection of computers all over the world, using transmission media (copper, fibre, wireless, etc.), special purpose devices (routers, gateways, switches, etc.), network operating systems (NOS) and applications software (email, web browsers, etc). The goal is to provide connectivity between machines and between users to:
- share resources
- increase reliability and availability
- collaborate (email, distributed computing, etc.)
- access remote information
Thus, the Internet is a vehicle for transferring data from one host (machine) to another. A host will have one or more network interfaces, i.e. network cards (or virtualised versions if it is VMWare etc.) most likely network technology you will see is switched Ethernet.
-
8.1.2 Hierarchical Structure of nodes
Internet service providers (ISPs) are roughly structured in a hierarchical manner.
At the lowest level are an organisation’s networks, e.g. the network of Glasgow Caledonian University. These local networks are sometimes called subnets, which are themselves usually split further into more subnets by network administrators to localise traffic and facilitate administration. However, the principles of operation are the same.
Routers and gateways are computer networking devices that forward data packets (see bolow) between networks (subnets) toward their destinations; a router if the 2 networks use same network technology; a gateway if they do not. A router/gateway contains a routing table containing information on where to route the packet next across the Internet.
A switch (or hub) is a lower level device which routes packets between hosts within a network. A switched ethernet is an example of a network technology using switches.
Therefore, these various nodes (routers, gateways etc) facilitate the movement of information ‘packages’.
-
8.1.3 Packet Switching at nodes
- It is normal practice in computer networking to split a message into packets (i.e. equal sized pieces) when transmitting and then to reassemble them at the receiver into the original message, allowing:
- the memory buffering needs of equipment to be specified
- the independent routing of different packets
- only part of the message to be retransmitted if a packet is found to be absent or corrupted
- In Figure 2 we see an example of how packet switching works:
- source host generates a message and converts it to packets
- packets transferred independently across network
- destination router delivers packets to the destination host
- destination host rearranges received packets to retrieve submitted message
-
8.1.4 Internet Applications
- An Internet application is a distributed system in which computations are performed by separate programs, normally running on separate pieces of hardware, that cooperate to perform the task of the system as a whole. Examples include:
- Electronic mail (e-mail)
- The World Wide Web (WWW) which uses HTTP as its protocol
- File transfer (FTP)
- Remote login such as Telnet and SSH
- Newsgroups
- Internet phone (VoIP)
- Real-time video conferencing
- Streaming audio and video
- Multi-user networked games
- Instant messaging
- P2P file sharing
- Internet application architectures are typically organised according to two common approaches:
- Client/Server model (C/S)
- one piece of the application acts as a server and another piece acts as a client
- the server program starts first and provides some service for clients that connect to it using a communication channel
- the client program requests services from the server; several clients can communicate with the server at the same time
- The sockets programs we will look at are simple examples of the client/server architecture.
- Peer-to-peer model (P2P)
- a system in which each program can act as both a client and a server for all the other programs
- each peer instance offers the same functionality
- Example Client / Server Interaction
- The server starts running.
- The server waits for clients to connect (listening).
- Clients start running and perform various operations, some of which require connection to the server to request a service.
- When a client attempts to connect, the server accepts the connection if it is willing.
- The server waits for messages to arrive from connected clients.
- Then the server takes some action in response and, typically, sends a message back to the client.
- Clients and servers continue functioning in this manner until one of them decides to shut down.
-
8.1.5 Communication Protocols
Communication protocols are:
“A set of specifications including formats, timing and rules that govern the functional operations of a telecommunications system in order to guarantee accurate and reliable transmission of data between stations on a network.”
Communication protocols are implemented in both hardware and software, which is a a very large and complex problem. Different protocols used to be written by each vendor for each application. This led to standardisation problems and made inter-communication between vendors difficult. Need standards, need to split the problem up.
8.1.6 Layered Protocols
The basic idea of a layered protocol is to split a previously unmanageable problem into manageable pieces by using a layered system. We are trying to make processes communicate sensibly with other processes running on networked computers with different architectures and different operating systems - we must identify how the various problems could be isolated from one another. Each layer only interacts with those directly above and below therefore interfaces between layers have to be accurately specified, well defined and unambiguous. One wants to be able to alter a layer without altering its interface specification and not to have to change any of the other layers. Different types of errors will be detected and corrected at each layer.
-
8.1.7 Layered protocols
5-layered protocol suite
5. Application Layer: provides application programs e.g. file transfer, web access and email. Additionally provides APIs to write these types of application e.g. an API for implementing email type applications or web-based functionality.
4. Transport Layer: network independent interface to application layer i.e. (if connection-oriented) provides a data-pipe: messages are split into equal sized segments which go in and come out undamaged in the correct order; routing of messages to processes. Segments are transmitted.
3. Network Layer: control of transmission through whole Internet from sending to receiving host: packet switching and internetworking. Packets are transmitted.
2. Link Layer: software control of point-to-point transmission; purpose is to provide an error free channel. Frames are transmitted.
1. Physical Layer: electrical, optical and physical definitions; signal definitions; host/network connection characteristics. Bits are transmitted.
- The protocol Stacks
- Actual data transmission is vertical (down left hand side of diagram and up right hand side of diagram)
- Although actual data transmission is vertical each layer is programmed as if it was horizontal: e.g. transport layer of sender “talks to” transport layer of receiver.
- Headers (TH etc.) are added at sender side to data through each layer until actual bits transmitted by physical layer; note (TH + transport layer data) becomes data of network layer and so on.
- At receiver headers are progressively stripped off to get original data back.
- The Link Layer checksum is error checking mechanism.
-
8.1.8 Connectionless and Connection Oriented Protocols
Network and transport layer protocols are often discussed in the following terms. Connectionless: the data is sent as a one-off packet, a datagram. The datagram is sent using a best-effort approach i.e. there is no guarantee that it will be delivered; if there are any errors it is simply discarded. No connection is established between sender and receiver so such a protocol runs with low overhead e.g. a database query. This approach exists because it runs faster as no connection is established. Connection-oriented: a connection is established between sender and receiver for the duration of the message. Robust error correction data transport is implemented; the message may consist of multiple segments/packets which are re-assembled into their correct order at receiver e.g. used for file transfer.
Protocol Suite
TCP, UDP and IP are the most common protocols in the suite, but be aware there are a few others...
-
8.1.9 IP Addresses and Port Numbers
Every host on the Internet has a unique IP address, which is used to identify specific host on the Internet. Port numbers allow different communication sessions on the one computer (same IP address) to be differentiated. The port number completes destination address for a communication session; in other words used to identify specific processes running on a host. Port numbers are added by TCP and UDP.
- IP Addresses: Every node (host or router) on the Internet is identified using a unique IP address, which have a given length:
- 32 bits in length (IPv4)
- usually written in dotted decimal format (each 8 bits are separated by a dot).
- 128 bits in length (IPv6)
To ensure that no two machines are mistakenly using the same IP, the allocation of IP addresses is done by InterNIC.
Loopback Address: This is a special IP address 127.0.0.1 or you can use the DNS name “localhost”. This is a virtual network interface. Data packets sent to the network interface at this IP address are immediately returned (they do not leave the sending host).So if you want to write clients and servers which communicate and are on the same host use the loopback address. This is generally used for initial testing during software development and will will work regardless of the real IP address of the host.
DHCP (Dynamic Host Configuration Protocol): The IP network settings can be set up manually or automatically on a host. This includes dynamic IP address, router address, DNS server and is setup as part of booting process. The DHCP sets up dynamic IP addresses, and network settings, automatically; i.e., a dynamic IP address is one that chosen for the host by the DHCP server when the host boots, and therefore can change. When settings are determined manually, it is often the case that this applies to servers, which use static IP addresses so they always have same IP addresses.
Using ifconfig : ifconfig is a command line program that comes with Linux. It can be used to display IP network information for each active network interface on the current system. It has many options, but the most commonly used is:
ifconfig –a
…which displays detailed IP network information about the host on which it is run including IP addresses of all network interfaces, router, DNS servers...
Alternatively for the host's own IP address:
hostname –I
on Windows use:
ipconfig /all
-
8.1.10 Protocols
- Internet Protocol (TCP)
- Network Layer
- Unreliable and connectionless.
- Data transferred in IP packets.
- sometimes also called datagrams (but I will not to avoid confusion with UDP)
- Concerned primarily with Internet routing i.e. routing between subnets; 64KB: so if data <= 64KB, one packet sent; if data > 64KB multiple packets are sent
- Uses IP addresses.
- Transmission Control Protocol (TCP)
- Transport Layer
- Splits message into a sequence of data segments.
- each data segment forms the data part of an IP packet
- TCP reconstructs received segments into correct order.
- requests retransmission of lost and corrupt segments
- Stream abstraction (see next lecture)
- Flow control:-
- the receiver can tell the transmitter that it cannot receive any more data at present. This typically happens when the receiver has filled-up its receiving buffer.
- Full-duplex (both ways at the same time), reliable and connection-oriented as participants establish connection before transmitting data.
- Adds port number.
- HTTP, FTP and SMTP use TCP.
- User Datagram Protocol (UDP)
- Transport Layer
- UDP datagram adds port number to a single IP packet.
- Datagram max. 64KB minus overheads
- Connectionless so runs with low overhead.
- nobody cares if datagram doesn’t arrive...
- typically sender tries again after timeout
- error detection i.e. if received datagram deemed corrupt just discarded – as if it never arrived
- DNS uses UDP.
-
8.1.11 TCP and UDP Port Numbers
Represented in a 16 bit positive integer (0 to 65535).
- Well-known ports
- 0 to 1023
- reserved for specific server applications
- e.g. HTTP – port 80 or FTP – port 21
- do not use ports in this range (unless you are implementing these specific application layer protocols)!
- Registered ports
- 1024 to 49151
- reserved for applications using specific registered ports
- e.g. Call of Duty – 28960
- best not to use in case you have applications running using these ports
- Dynamic ports
- 49152 to 65535
- you can use – pick what you want...
TCP and UDP Summary:
- UDP
- Stands for User Datagram Protocol.
- Connectionless (i.e. no connection is established).
- Data is delivered in a datagram (a single packet).
- Datagrams are routed over the network until they reach their target.
- Unreliable: no guarantee of delivery; datagrams can be lost and also datagrams received corrupted are just discarded.
- Advantage: faster; thus more suitable for real-time applications.
- Disadvantage: limited message length; lose benefits of robust communication and unlimited message length of TCP.
- TCP
- Stands for Transmission Control Protocol.
- Connection oriented (i.e. a connection is first established before data exchange begins).
- Data bytes are delivered as streams (in sequence).
- Data bytes are split into multiple segments which are routed over the network until they reach their target.
- Provides error and flow control to ensure packets reach target reliably (i.e. in order sent without errors).
- Connection is terminated once one of the communicating devices requests it.
- Disadvantage: slower especially if the network generates errors forcing retransmission.
-
8.1.12 Sockets
- Sockets are the basic API for writing networked applications:
- Stream sockets (TCP) – Transport Layer
- Datagram sockets (UDP) – Transport Layer
- Raw sockets (IP) – Network Layer
Application layer protocols also have their own API libraries (themselves implemented using sockets). The sockets APIs are fairly standard across languages and OSs. We will limit out discussions to stream sockets (i.e. TCP sockets) in this module. A typical stream sockets application behaves like the Client/Server Interaction example shown above.
Application Layer: Looking back at our “Protocol Stacks” diagram you should note in the programs we will write the application layer is omitted. We are coding TCP sockets at the transport layer. In fact, you are providing the application layer in your application, which is a perfectly OK thing to do! A good example of an application layer protocol is HTTP. HTTP transports web requests and adds HTTP headers. An HTTP API is often available which a programmer can use to create HTTP requests. This adds HTTP headers automatically and uses TCP sockets underneath.
Domain Name System (DNS): DNS provides a means of mapping between a numeric IP address (which is hard to remember) and a hierarchical string (which is easier to remember). For example, would you rather try to remember www.google.com or 216.239.39.100?
DNS Resolvers: A resolver uses name resolution to query DNS name servers. Operation is transparent to applications. This manifests itself in two ways – by an API call (included in sockets libraries) and by the nslookup command line tool provided in most modern operating systems.
try:
<nslookup www.ebay.co.uk>
“non-authoritative” indicates that this information is obtained from the cache of the local DNS server rather than the one responsible for the mapping
-
8.1.13 Coding with the Internet Protocols
- As software developers remember:
- the transport layer is the lowest layer we usually implement applications with i.e. TCP and UDP sockets
- the application layer protocols are themselves implemented with sockets – these too have their own class libraries (or API libraries) to write higher level applications
- the higher the level of the software layer, the easier it is to write applications but you lose expressive power – this is a tradeoff