Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Unit 4: Datalink Access — BPF, DLPI, SOCK_PACKET, libpcap & the UDP Checksum

Lesson 14 of 15 in the free Network Programming notes on Siksha Sarovar, written by Rohit Jangra.

6.1 One layer lower still

Raw sockets see IP datagrams addressed to (or from) this host. Datalink access captures entire frames — anyone's — straight from the NIC (in promiscuous mode): the foundation of tcpdump, Wireshark, ARP tools and protocol analysis. Three historical doorways, one portable wrapper:

6.2 BPF — the BSD Packet Filter

The BSD design, and the conceptual winner:

  • Each NIC's driver hands frames to BPF taps; a user process opens /dev/bpf, attaches an interface, sets promiscuous mode.
  • In-kernel filtering: the process loads a filter program — bytecode for a small register machine — and the kernel runs it on every frame, copying to the process only matches. Filtering before copying is the whole performance story (kernel→user copies are the expensive part).
  • Buffering & timeouts: matched frames are batched in a kernel buffer, delivered when full or on timeout (default ~1 s) — fewer, larger reads.

BPF's filter machine outgrew packet capture: Linux's modern eBPF runs verified bytecode for tracing, security and networking throughout the kernel — this syllabus item is the ancestor of one of today's hottest technologies.

6.3 DLPI and Linux's SOCK_PACKET / PF_PACKET

  • DLPI (Data Link Provider Interface): SVR4/Solaris' STREAMS-based equivalent — open /dev/le0-style device, attach/bind via DLPI messages; efficient only with kernel filter modules (pfmod/bufmod) stacked on top.
  • SOCK_PACKET — old Linux: socket(AF_INET, SOCK_PACKET, htons(ETH_P_ALL)) delivers every frame from every interface; no kernel filtering, no buffering — every frame crosses to user space (then filtered there: costly). Modern Linux replaces it with PF_PACKET sockets (AF_PACKET), which can attach (e)BPF filters and mmap ring buffers — closing the gap with BPF.
BPF (BSD)DLPI (Solaris)SOCK_PACKET / PF_PACKET (Linux)
access object/dev/bpf deviceSTREAMS devicea socket
kernel filteringyes — BPF bytecodevia pfmod moduleold: none; PF_PACKET: BPF
bufferingyes (timeout/size)via bufmodold: none; new: ring buffer
efficiencyhighhigh (with modules)old: poor; new: high

6.4 libpcap — the portable capture library

libpcap wraps all three behind one API — tcpdump, Wireshark and your lab programs all sit on it:

pcap_t *p = pcap_open_live("eth0", 65535 /*snaplen*/, 1 /*promisc*/, 1000 /*ms*/, errbuf);
struct bpf_program fp;
pcap_compile(p, &fp, "udp and dst port 53", 1, netmask);  /* tcpdump syntax →
                                                              BPF bytecode      */
pcap_setfilter(p, &fp);
pcap_loop(p, -1, handler, NULL);    /* handler(user, pkthdr, packet-bytes) */

The filter-expression compiler is the magic: a readable string ("host A and tcp port 80") becomes kernel-loaded BPF bytecode. Each captured packet arrives with a pcap_pkthdr (timestamp, captured length, on-wire length) followed by the raw frame — your handler parses Ethernet → IP → UDP/TCP headers by hand: the best protocol-internals exercise in the course.

6.5 Examining the UDP checksum field — the textbook's capstone capture program

The motivating application: what fraction of UDP traffic actually uses checksums? (The UDP checksum is optional in IPv4 — 0 means "not computed". It covers a pseudo-header — source/dest IP, protocol, UDP length — plus header and data, catching misdelivered datagrams too.)

The program udpcksum demonstrates the full down-the-stack toolkit in one tool:

  1. open libpcap with filter "udp and dst port X" — datalink receive;
  2. send a test datagram to that port through a raw IP socket with IP_HDRINCL — hand-built IP+UDP headers, so the checksum field is under our control;
  3. capture the frame, parse Ethernet/IP/UDP, read uh_sum, verify the checksum ourselves with in_cksum over the pseudo-header (the previous lesson's algorithm, reused verbatim);
  4. print whether the peer's stack computed it.

Why capture below IP at all? Because the kernel's UDP code silently discards bad-checksum datagrams before a normal socket ever sees them — only datalink access can observe what's really on the wire. That single sentence justifies this whole lesson; quote it.

6.6 The capture path, drawn

Read off the two performance morals the diagram encodes: capture is a copy, not an interception — the normal stack still gets every frame (your tcpdump doesn't break anyone's TCP); and the filter sits before the kernel→user copy, so non-matching frames cost almost nothing. The §6.3 table's "efficiency" row is exactly the question of where (and whether) FILT and BUF exist.

6.7 Vocabulary that examiners probe

  • Promiscuous mode: the NIC stops filtering by destination MAC and passes up every frame on the segment. Needed to see other hosts' traffic; your own host's traffic is visible without it. (On a modern switched LAN, frames for other hosts mostly never reach your port anyway — capture sees your traffic + broadcast/multicast unless the switch mirrors a port; this practical footnote shows real understanding.)
  • snaplen: how many bytes of each frame to keep. Headers-only analysis sets ~100–200 bytes — vastly less copying and storage; 65535 means "whole packet".
  • read timeout: the batching clock (BPF's ~1 s default; libpcap's 4th argument) — trade latency for fewer wakeups. Setting it to 0 means "wait until the buffer fills": fine for bulk statistics, wrong for an interactive sniffer.
  • pcap_pkthdr's two lengths: caplen (bytes actually captured, ≤ snaplen) vs len (true on-wire length) — analysis code must check both or walk off the end of truncated frames.

6.8 Parsing a captured frame — the handler skeleton

The exam's favourite practical fragment — walk the headers by arithmetic, converting byte order as you go (everything on the wire is big-endian, lesson 2's rule applied one layer down):

void handler(u_char *user, const struct pcap_pkthdr *h, const u_char *p) {
    struct ether_header *eth = (struct ether_header *)p;
    if (ntohs(eth->ether_type) != ETHERTYPE_IP) return;   /* not IPv4      */

    struct ip *ip = (struct ip *)(p + 14);                /* Ethernet = 14 */
    int iphl = ip->ip_hl << 2;                            /* header words → bytes */
    if (ip->ip_p != IPPROTO_UDP) return;

    struct udphdr *udp = (struct udphdr *)((u_char *)ip + iphl);
    printf("%s:%d → ", inet_ntoa(ip->ip_src), ntohs(udp->uh_sport));
    printf("%d  sum=0x%04x%s\n", ntohs(udp->uh_dport), ntohs(udp->uh_sum),
           udp->uh_sum == 0 ? "  (NO CHECKSUM)" : "");
}

The three traps it sidesteps are the marks: the IP header length is variable (options!) — always ip_hl × 4, never a constant 20; every multi-byte field needs ntohs/ntohl; and the Ethernet header is exactly 14 bytes (6 + 6 + 2) — a number worth simply knowing.

6.9 The UDP pseudo-header, spelled out

Since udpcksum's verification step hinges on it, lay the pseudo-header out once:

+--------+--------+--------+--------+
|        source IPv4 address        |   ← from the IP header
+--------+--------+--------+--------+
|      destination IPv4 address     |   ← from the IP header
+--------+--------+--------+--------+
|  zero  | proto=17|   UDP length   |
+--------+--------+--------+--------+
   ...then the real UDP header + data (padded to even length)

It is prepended for computation only — never transmitted. Its purpose: the checksum then covers the addresses, so a datagram delivered to the wrong host (corrupted destination, misrouted) fails verification at the receiver even though its own header and data are intact. One subtle consequence: NAT boxes that rewrite addresses must also fix up UDP/TCP checksums — a modern footnote that earns credit.

Exam pointers

  • "Compare BPF, DLPI and SOCK_PACKET" — §6.3's table; structure the prose around the two design questions: where is the filter? and where is the buffer?
  • "What is libpcap? Explain a capture program's structure" — the five calls in order (open_live → compile → setfilter → loop → handler), then §6.8's parsing skeleton.
  • "Why is the UDP checksum computed over a pseudo-header?" — misdelivery detection; draw §6.9; mention optional-in-IPv4 (0 = absent) vs mandatory in IPv6 (no IP-header checksum exists there to back it up).
  • The one-sentence why-this-lesson answer: bad-checksum datagrams die inside the kernel — only the tap before IP sees the truth.

Check yourself

  1. Trace a frame destined for another host through §6.6's diagram twice: once with kernel filtering (BPF), once with old SOCK_PACKET. Where does the cost diverge?
  2. Your sniffer on a switched office LAN sees almost no traffic between two other PCs. Is the code broken? What changed since the shared-Ethernet era, and what's the operational fix?
  3. caplen = 96 but len = 1514 — what did the capture configuration say, and which analyses remain valid?
  4. Why must udpcksum send its test datagram with IP_HDRINCL rather than through a normal UDP socket? (What field could it not otherwise control?)
  5. An IPv4 datagram arrives with uh_sum = 0. Error or not? Same question for IPv6 — and why does IPv6 differ?