5.1 What raw sockets are for
TCP and UDP sockets only carry TCP and UDP. A raw socket (SOCK_RAW) opens the trapdoor to the IP layer itself, enabling programs to:
- send/receive ICMPv4 & ICMPv6 — ping, traceroute, router discovery;
- speak IP protocols the kernel doesn't implement (OSPF rides protocol 89 this way);
- build their own IP headers (IP_HDRINCL) — full control of TTL, flags, even source address.
Rules of the road: creation requires superuser (raw power = spoofing power); there are no ports — demultiplexing is by protocol number (and ICMP type filtering); received datagrams arrive with the IP header included; kernel computes the IP checksum, but you compute the ICMP checksum.
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); /* needs root */
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on)); /* optional: DIY IP header */
The Internet checksum algorithm (used by ICMP — and by the UDP check in the next lesson): one's-complement sum of 16-bit words, then complement. Reproduce in_cksum from memory:
uint16_t in_cksum(uint16_t *addr, int len) {
uint32_t sum = 0;
while (len > 1) { sum += *addr++; len -= 2; }
if (len == 1) sum += *(unsigned char *)addr;
sum = (sum >> 16) + (sum & 0xffff); /* fold carries */
sum += (sum >> 16);
return (uint16_t)~sum;
}
5.2 Ping — ICMP echo in ~100 lines
Protocol: send ICMP echo request (type 8/code 0; IPv6: type 128); any host's IP stack answers with echo reply (type 0; IPv6: 129). No server process exists — the kernel itself replies; that's why ping works against everything.
Program structure:
- Send half (paced by SIGALRM each second): build ICMP header — type 8, code 0, identifier = getpid() (so concurrent pings don't collect each other's replies), incrementing sequence number, 8-byte send timestamp as payload; compute checksum; sendto.
- Receive half (the main loop): recvfrom gives the whole IP datagram → skip
ip_hl × 4bytes → check ICMP type = echo reply and identifier = mine → RTT = now − embedded timestamp:
64 bytes from 93.184.216.34: icmp_seq=1 ttl=56 time=11.3 ms
The reply's IP TTL reveals distance (64/128/255 minus hops). IPv6 differences: ICMPv6 checksum includes a pseudo-header (kernel computes it), kernel delivers ICMPv6 without the IPv6 header, and ICMP6_FILTER selects which types you receive.
5.3 Traceroute — weaponising TTL
Every router decrements TTL; at 0 it discards the packet and returns ICMP time exceeded (type 11). Traceroute turns this error into a map:
send UDP datagram, TTL=1, to dest port 33434 (unlikely to be in use)
→ first router discards, sends TIME EXCEEDED → hop 1 identified
send with TTL=2 → second router answers → hop 2
...
until the DESTINATION answers PORT UNREACHABLE (type 3 code 3)
→ arrived; stop.
Implementation notes: two sockets — a UDP socket for probes (TTL set per-probe via IP_TTL) + a raw ICMP socket for the answers; 3 probes per TTL, printing router address + RTT, * on timeout; increments the destination port per probe to match replies (the ICMP error helpfully quotes the offending header). Variants: ICMP-echo probes (Windows tracert) and TCP-SYN probes (firewalls pass port 80 — "tcptraceroute").
5.4 An ICMP message daemon (icmpd) — the design exercise
Problem from Unit 2: ICMP errors for UDP sockets reach only connected sockets, with no detail. The textbook's icmpd design fixes this with pure course machinery:
- icmpd runs as root with raw ICMP sockets, watching all ICMP traffic;
- an unprivileged client passes its UDP socket descriptor to icmpd over a Unix-domain socket using SCM_RIGHTS ancillary data (Unit 2's descriptor passing!);
- icmpd calls getsockname/getpeername on the passed descriptor to learn its (addr, port), and when a matching ICMP error arrives (the quoted UDP header identifies the victim), notifies the client with the precise error.
One daemon: raw sockets + Unix-domain sockets + descriptor passing + select — a capstone showing how the course's pieces compose into real tooling.
5.5 The ICMP messages this unit lives on
A working table of the types every answer in this unit cites:
| Type / code | Name | Who sends it | Used by |
|---|---|---|---|
| 8 / 0 → 0 / 0 | echo request → echo reply | any host's kernel | ping |
| 11 / 0 | time exceeded (TTL = 0 in transit) | a router | traceroute's hop discovery |
| 3 / 3 | destination unreachable: port | destination host | traceroute's terminator; UDP's ECONNREFUSED |
| 3 / 0, 3 / 1 | net / host unreachable | a router | EHOSTUNREACH soft errors |
| 3 / 4 | fragmentation needed but DF set | a router | path-MTU discovery |
| 5 | redirect | a router | route optimisation |
| 0 in IPv6: 128/129, 3→1, 11→3 | the ICMPv6 renumbering | — | same tools, new numbers |
Structural fact worth one sentence: every ICMP error message (types 3, 5, 11...) carries the IP header + first 8 bytes of the datagram that provoked it — and those 8 bytes contain the UDP/TCP ports. That quoting rule is what lets traceroute match errors to probes and lets icmpd identify the victim socket; without it, ICMP errors would be anonymous.
5.6 Raw-socket reception rules — which datagrams reach your raw socket?
A precise mini-answer examiners like: a received datagram is delivered to a raw socket only if (1) the socket's protocol matches the datagram's protocol field; (2) if the raw socket is bound to a local address, the destination matches it; (3) if connected, the source matches the peer. Three more rules complete the picture: TCP and UDP datagrams never reach a raw socket (the kernel's own transport handlers consume them first — capture below IP, Unit 4's next lesson, is the only way to see them); most ICMP is copied to matching raw sockets after the kernel processes it; and a datagram of an unhandled protocol (say OSPF's 89) goes only to raw listeners — or triggers ICMP protocol-unreachable if none exists.
5.7 Ping's send half, annotated
void send_v4(void) {
struct icmp *icmp = (struct icmp *)sendbuf;
icmp->icmp_type = ICMP_ECHO; /* type 8 */
icmp->icmp_code = 0;
icmp->icmp_id = getpid() & 0xffff; /* OUR id — reply filter */
icmp->icmp_seq = nsent++; /* detect loss & reordering */
gettimeofday((struct timeval *)icmp->icmp_data, NULL); /* RTT clock IN the packet */
int len = 8 + datalen; /* header + payload */
icmp->icmp_cksum = 0; /* MUST be 0 while summing */
icmp->icmp_cksum = in_cksum((uint16_t *)icmp, len);
sendto(sockfd, sendbuf, len, 0, pr->sasend, pr->salen);
}
Three design points, each a viva question: the timestamp travels inside the packet, so the receive path needs no table of send times (the network itself carries the state — and the echoed payload proves data integrity too); the id field is the only thing separating your replies from another ping's (the kernel copies all ICMP echo replies to every ICMP raw socket — filtering is your job); and the checksum is computed over the packet with the checksum field zeroed — forget that and every packet is invalid.
5.8 Ping vs traceroute — one table to bind the two programs
| Aspect | ping | traceroute |
|---|---|---|
| question asked | "are you alive? how far in time?" | "which routers lie on the path?" |
| probe | ICMP echo request | UDP to an improbable port (or ICMP/TCP variants) |
| answered by | destination's kernel | each router (time exceeded), then destination (port unreachable) |
| TTL usage | default — wants the probe to arrive | the protagonist — deliberately expires at hop N |
| sockets | 1 raw ICMP | UDP sender + raw ICMP listener |
| terminates when | user stops it | port-unreachable received (or max TTL) |
| failure display | lost replies (timeouts) | * for unanswering hops |
Worth appending: both depend on routers/hosts choosing to answer — firewalled hosts that drop ICMP make ping report a live host dead, and rate-limited routers make traceroute rows show * while later rows answer. Tool output is evidence, not truth; saying so in a viva lands well.
Exam pointers
- "What are raw sockets? State their properties and uses" — the three uses (§5.1), then the property list: root-only, no ports, protocol-based demux, IP header included on receive, ICMP checksum is yours; close with the reception rules of §5.6.
- "Explain the working of ping / traceroute" — structure + diagram from §5.2/§5.3; for traceroute always name both terminating message types (11 en route, 3/3 at arrival).
- "Write the Internet checksum function" — §5.1's in_cksum; mention one's-complement folding and the zero-field rule from §5.7.
- Design question: "how can an unprivileged process learn ICMP errors for its UDP socket?" — icmpd's three mechanisms (raw socket, Unix-domain socket, SCM_RIGHTS descriptor passing), in that order.
Check yourself
- Why does ping need no server program installed on the target — and what kernel subsystem answers?
- Which field lets two simultaneous pings on one machine sort out whose replies are whose, and why does the kernel make that necessary?
- Traceroute hop 7 shows
*but hops 8–12 answer. Reconcile. - Why must traceroute's UDP destination port be "improbable"? What would a listening port at the destination break?
- A raw IPPROTO_TCP socket is opened to sniff web traffic. What arrives, and why? Which Unit-4 facility actually solves this?