6.1 One layer lower still
Raw sockets see IP datagrams addressed to (or from) this host. Datalink access captures entire frames — anyone's — straight from the NIC (in promiscuous mode): the foundation of tcpdump, Wireshark, ARP tools and protocol analysis. Three historical doorways, one portable wrapper:
6.2 BPF — the BSD Packet Filter
The BSD design, and the conceptual winner:
- Each NIC's driver hands frames to BPF taps; a user process opens
/dev/bpf, attaches an interface, sets promiscuous mode. - In-kernel filtering: the process loads a filter program — bytecode for a small register machine — and the kernel runs it on every frame, copying to the process only matches. Filtering before copying is the whole performance story (kernel→user copies are the expensive part).
- Buffering & timeouts: matched frames are batched in a kernel buffer, delivered when full or on timeout (default ~1 s) — fewer, larger reads.
BPF's filter machine outgrew packet capture: Linux's modern eBPF runs verified bytecode for tracing, security and networking throughout the kernel — this syllabus item is the ancestor of one of today's hottest technologies.
6.3 DLPI and Linux's SOCK_PACKET / PF_PACKET
- DLPI (Data Link Provider Interface): SVR4/Solaris' STREAMS-based equivalent — open
/dev/le0-style device, attach/bind via DLPI messages; efficient only with kernel filter modules (pfmod/bufmod) stacked on top. - SOCK_PACKET — old Linux:
socket(AF_INET, SOCK_PACKET, htons(ETH_P_ALL))delivers every frame from every interface; no kernel filtering, no buffering — every frame crosses to user space (then filtered there: costly). Modern Linux replaces it with PF_PACKET sockets (AF_PACKET), which can attach (e)BPF filters and mmap ring buffers — closing the gap with BPF.
| BPF (BSD) | DLPI (Solaris) | SOCK_PACKET / PF_PACKET (Linux) | |
|---|---|---|---|
| access object | /dev/bpf device | STREAMS device | a socket |
| kernel filtering | yes — BPF bytecode | via pfmod module | old: none; PF_PACKET: BPF |
| buffering | yes (timeout/size) | via bufmod | old: none; new: ring buffer |
| efficiency | high | high (with modules) | old: poor; new: high |
6.4 libpcap — the portable capture library
libpcap wraps all three behind one API — tcpdump, Wireshark and your lab programs all sit on it:
pcap_t *p = pcap_open_live("eth0", 65535 /*snaplen*/, 1 /*promisc*/, 1000 /*ms*/, errbuf);
struct bpf_program fp;
pcap_compile(p, &fp, "udp and dst port 53", 1, netmask); /* tcpdump syntax →
BPF bytecode */
pcap_setfilter(p, &fp);
pcap_loop(p, -1, handler, NULL); /* handler(user, pkthdr, packet-bytes) */
The filter-expression compiler is the magic: a readable string ("host A and tcp port 80") becomes kernel-loaded BPF bytecode. Each captured packet arrives with a pcap_pkthdr (timestamp, captured length, on-wire length) followed by the raw frame — your handler parses Ethernet → IP → UDP/TCP headers by hand: the best protocol-internals exercise in the course.
6.5 Examining the UDP checksum field — the textbook's capstone capture program
The motivating application: what fraction of UDP traffic actually uses checksums? (The UDP checksum is optional in IPv4 — 0 means "not computed". It covers a pseudo-header — source/dest IP, protocol, UDP length — plus header and data, catching misdelivered datagrams too.)
The program udpcksum demonstrates the full down-the-stack toolkit in one tool:
- open libpcap with filter "udp and dst port X" — datalink receive;
- send a test datagram to that port through a raw IP socket with IP_HDRINCL — hand-built IP+UDP headers, so the checksum field is under our control;
- capture the frame, parse Ethernet/IP/UDP, read
uh_sum, verify the checksum ourselves with in_cksum over the pseudo-header (the previous lesson's algorithm, reused verbatim); - print whether the peer's stack computed it.
Why capture below IP at all? Because the kernel's UDP code silently discards bad-checksum datagrams before a normal socket ever sees them — only datalink access can observe what's really on the wire. That single sentence justifies this whole lesson; quote it.
6.6 The capture path, drawn
Read off the two performance morals the diagram encodes: capture is a copy, not an interception — the normal stack still gets every frame (your tcpdump doesn't break anyone's TCP); and the filter sits before the kernel→user copy, so non-matching frames cost almost nothing. The §6.3 table's "efficiency" row is exactly the question of where (and whether) FILT and BUF exist.
6.7 Vocabulary that examiners probe
- Promiscuous mode: the NIC stops filtering by destination MAC and passes up every frame on the segment. Needed to see other hosts' traffic; your own host's traffic is visible without it. (On a modern switched LAN, frames for other hosts mostly never reach your port anyway — capture sees your traffic + broadcast/multicast unless the switch mirrors a port; this practical footnote shows real understanding.)
- snaplen: how many bytes of each frame to keep. Headers-only analysis sets ~100–200 bytes — vastly less copying and storage; 65535 means "whole packet".
- read timeout: the batching clock (BPF's ~1 s default; libpcap's 4th argument) — trade latency for fewer wakeups. Setting it to 0 means "wait until the buffer fills": fine for bulk statistics, wrong for an interactive sniffer.
- pcap_pkthdr's two lengths:
caplen(bytes actually captured, ≤ snaplen) vslen(true on-wire length) — analysis code must check both or walk off the end of truncated frames.
6.8 Parsing a captured frame — the handler skeleton
The exam's favourite practical fragment — walk the headers by arithmetic, converting byte order as you go (everything on the wire is big-endian, lesson 2's rule applied one layer down):
void handler(u_char *user, const struct pcap_pkthdr *h, const u_char *p) {
struct ether_header *eth = (struct ether_header *)p;
if (ntohs(eth->ether_type) != ETHERTYPE_IP) return; /* not IPv4 */
struct ip *ip = (struct ip *)(p + 14); /* Ethernet = 14 */
int iphl = ip->ip_hl << 2; /* header words → bytes */
if (ip->ip_p != IPPROTO_UDP) return;
struct udphdr *udp = (struct udphdr *)((u_char *)ip + iphl);
printf("%s:%d → ", inet_ntoa(ip->ip_src), ntohs(udp->uh_sport));
printf("%d sum=0x%04x%s\n", ntohs(udp->uh_dport), ntohs(udp->uh_sum),
udp->uh_sum == 0 ? " (NO CHECKSUM)" : "");
}
The three traps it sidesteps are the marks: the IP header length is variable (options!) — always ip_hl × 4, never a constant 20; every multi-byte field needs ntohs/ntohl; and the Ethernet header is exactly 14 bytes (6 + 6 + 2) — a number worth simply knowing.
6.9 The UDP pseudo-header, spelled out
Since udpcksum's verification step hinges on it, lay the pseudo-header out once:
+--------+--------+--------+--------+
| source IPv4 address | ← from the IP header
+--------+--------+--------+--------+
| destination IPv4 address | ← from the IP header
+--------+--------+--------+--------+
| zero | proto=17| UDP length |
+--------+--------+--------+--------+
...then the real UDP header + data (padded to even length)
It is prepended for computation only — never transmitted. Its purpose: the checksum then covers the addresses, so a datagram delivered to the wrong host (corrupted destination, misrouted) fails verification at the receiver even though its own header and data are intact. One subtle consequence: NAT boxes that rewrite addresses must also fix up UDP/TCP checksums — a modern footnote that earns credit.
Exam pointers
- "Compare BPF, DLPI and SOCK_PACKET" — §6.3's table; structure the prose around the two design questions: where is the filter? and where is the buffer?
- "What is libpcap? Explain a capture program's structure" — the five calls in order (open_live → compile → setfilter → loop → handler), then §6.8's parsing skeleton.
- "Why is the UDP checksum computed over a pseudo-header?" — misdelivery detection; draw §6.9; mention optional-in-IPv4 (0 = absent) vs mandatory in IPv6 (no IP-header checksum exists there to back it up).
- The one-sentence why-this-lesson answer: bad-checksum datagrams die inside the kernel — only the tap before IP sees the truth.
Check yourself
- Trace a frame destined for another host through §6.6's diagram twice: once with kernel filtering (BPF), once with old SOCK_PACKET. Where does the cost diverge?
- Your sniffer on a switched office LAN sees almost no traffic between two other PCs. Is the code broken? What changed since the shared-Ethernet era, and what's the operational fix?
- caplen = 96 but len = 1514 — what did the capture configuration say, and which analyses remain valid?
- Why must udpcksum send its test datagram with IP_HDRINCL rather than through a normal UDP socket? (What field could it not otherwise control?)
- An IPv4 datagram arrives with uh_sum = 0. Error or not? Same question for IPv6 — and why does IPv6 differ?