This time in our userspace TCP/IP stack we will implement a minimum viable IP layer and test it with ICMP echo requests (also known as pings).
We will take a look at the formats of IPv4 and ICMPv4 and describe how to check them for integrity. Some features, such as IP fragmentation, are left as an exercise for the reader.
For our networking stack IPv4 was chosen over IPv6 since it is still the default network protocol for the Internet. However, this is changing fast1 and our networking stack can be extended with IPv6 in the future.
- Internet Protocol version 4
- Internet Control Message Protocol version 4
- Testing the implementation
Internet Protocol version 4
The next layer (L3)2 in our implementation, after Ethernet frames, handles the delivery of data to a destination. Namely, the Internet Protocol (IP) was invented to provide a base for transport protocols such as TCP and UDP. It is connectionless, meaning that unlike TCP, all of the datagrams are handled independently of each other in the networking stack. This also means that IP datagrams may arrive out-of-order.3
Furthermore, IP does not guarantee successful delivery. This is a conscious choice taken by the protocol designers, since IP is meant to provide a base for protocols that likewise do not guarantee delivery. UDP is one such protocol.
If reliability between the communicating parties is required, a protocol such as TCP is used on top of IP. In that case, the higher level protocol is responsible for detecting missing data and making sure all of it is delivered.
The IPv4 header is typically 20 octets in length. The header can contain trailing options, but they are omitted from our implementation. The meaning of the fields is relatively straightforward and can be described as a C struct:
version field indicates the format of the Internet header. In our case, the value will be 4 for IPv4.
The Internet Header Length field
ihl is likewise 4 bits in length and indicates the number of 32-bit words in the IP header. Because the field is 4 bits in size, it can only hold a maximum value of 15. Thus the maximum length of an IP header is 60 octets (15 times 32 divided by eight).
The type of service field
tos originates from the first IP specification4. It has been divided into smaller fields in later specifications, but for simplicity’s sake, we will treat the field as defined in the original specification. The field communicates the quality of service intended for the IP datagram.
The total length field
len communicates the length of the whole IP datagram. As it is a 16-bit field, the maximum length is then 65535 bytes. Large IP datagrams are subject to fragmentation, in which they are split into smaller datagrams in order to satisfy the Maximum Transmission Unit (MTU) of different communication interfaces.
id field is used to index the datagram and is ultimately used for reassembly of fragmented IP datagrams. The field’s value is simply a counter that is incremented by the sending party. In turn, the receiving side knows how to order the incoming fragments.
flags field defines various control flags of the datagram. In specific, the sender can specify whether the datagram is allowed to be fragmented, whether it is the last fragment or that there’s more fragments incoming.
The fragment offset field,
frag_offset, indicates the position of the fragment in a datagram. Naturally, the first datagram has this index set to 0.
ttl or time to live is a common attribute that is used to count down the datagram’s lifetime. It is usually set to 64 by the original sender, and every receiver decrements this counter by one. When it hits zero, the datagram is to be discarded and possibly an ICMP message is replied back to indicate an error.
proto field provides the datagram an inherent ability to carry other protocols in its payload. The field usually contains values such as 16 (UDP) or 6 (TCP), and is simply used to communicate the type of the actual data to the receiver.
The header checksum field,
csum, is used to verify the integrity of the IP header. The algorithm for it is relatively simple, and will be explained further down in this tutorial.
daddr fields indicate the source and destination addresses of the datagram, respectively. Even though the fields are 32-bit in length and thus provide a pool of approximately 4.5 billion addresses, the address range is going to be exhausted in the near-future5. The IPv6 protocol extends this length to 128 bits and as a result, future-proofs the address range of the Internet Protocol, perhaps permanently.
The Internet checksum field is used to check the integrity of an IP datagram. Calculating the checksum is relatively simple and is defined in the original specification4:
The checksum field is the 16 bit one’s complement of the one’s complement sum of all 16 bit words in the header. For purposes of computing the checksum, the value of the checksum field is zero.
The actual6 code for the algorithm is as follows:
Take the example IP header
45 00 00 54 41 e0 40 00 40 01 00 00 0a 00 00 04 0a 00 00 05:
- Adding the fields together yields the two’s complement sum
01 1b 3e.
- Then, to convert it to one’s complement, the carry-over bits are added to the first 16-bits:
- Finally, the one’s complement of the sum is taken, resulting to the checksum value
The IP header becomes
45 00 00 54 41 e0 40 00 40 01 e4 c0 0a 00 00 04 0a 00 00 05.
The checksum can be verified by applying the algorithm again and if the result is 0, the data is most likely good.
Internet Control Message Protocol version 4
As the Internet Protocol lacks mechanisms for reliability, some way of informing communicating parties of possible error scenarios is required. As a result, the Internet Control Message Protocol (ICMP)7 is used for diagnostic measures in the network. An example of this is the case where a gateway is not reachable - the network stack that recognizes this sends an ICMP “Gateway Unreachable” message back to the origin.
The ICMP header resides in the payload of the corresponding IP packet. The structure of the ICMPv4 header is as follows:
type field communicates the purpose of the message. 42 different8 values are reserved for the type field, but only about 8 are commonly used. In our implementation, the types 0 (Echo Reply), 3 (Destination Unreachable) and 8 (Echo request) are used.
code field further describes the meaning of the message. For example, when the type is 3 (Destination Unreachable), the code-field implies the reason. A common error is when a packet cannot be routed to a network: the originating host then most likely receives an ICMP message with the type 3 and code 0 (Net Unreachable).
csum field is the same checksum field as in the IPv4 header, and the same algorithm can be used to calculate it. In ICMPv4 however, the checksum is end-to-end, meaning that also the payload is included when calculating the checksum.
Messages and their processing
The actual ICMP payload consists of query/informational messages and error messages. First, we’ll look at the Echo Request/Reply messages, commonly referred to as “pinging” in networking:
The message format is compact. The field
id is set by the sending host to determine to which process the echo reply is intended. For example, the process id can be set in to this field.
seq is the sequence number of the echo and it is simply a number starting from zero and incremented by one whenever a new echo request is formed. This is used to detect if echo messages disappear or are reordered while in transit.
data field is optional, but often contains information like the timestamp of the echo. This can then be used to estimate the round-trip time between hosts.
Perhaps the most common ICMPv4 error message, Destination Unreachable, has the following format:
The first octet is unused. Then, the
len field indicates the length of the original datagram, in 4-octet units for IPv4. The value of the 2-octet field
var depends on the ICMP code.
Finally, as much as possible of the original IP packet that caused the Destination Unreachable state is placed into the
Testing the implementation
From a shell, we can verify that our userspace networking stack responds to ICMP echo requests:
A minimum viable networking stack that handles Ethernet frames, ARP and IP can be created relatively easily. However, the original specifications have been extended with many new ones. In this post, we skimmed over IP features such as options, fragmentation and the header DCN and DS fields.
Furthermore, IPv6 is crucial for the future of the Internet. It is not yet ubiquitous but being a newer protocol than IPv4, it definitely is something that should be implemented in our networking stack.
The source code for this blog post can be found at GitHub.
In the next blog post we will advance to the transport layer (L4) and start implementing the notorious Transmission Control Protocol (TCP). Namely, TCP is a connection-oriented protocol and ensures reliability between both communicating sides. These aspects obviously bring about more complexity, and being an old protocol, TCP has its dark corners.