1.1 Where socket programs sit in the layer cake
OSI model what network programs see
7 Application ┐
6 Presentation │ ←── YOUR PROGRAM (all three top layers)
5 Session ┘
4 Transport ←── TCP / UDP — the SOCKETS API boundary
3 Network ←── IP (raw sockets reach here — Unit 4)
2 Data link ←── BPF / DLPI / SOCK_PACKET reach here — Unit 4
1 Physical
The sockets API is the door between the application process and the kernel's transport layer. Everything above the dotted line is your code; everything below runs in the kernel. The practical model is the 4-layer TCP/IP stack (application / transport / network / link) rather than the full 7-layer OSI.
1.2 Unix standards (know the names)
| Standard | What it is |
|---|---|
| POSIX (IEEE 1003) | the portable OS interface — defines the sockets functions we use |
| The Single UNIX Specification / XPG | The Open Group's superset; "UNIX certified" systems implement it |
| BSD heritage | sockets were born in 4.2BSD (1983); "Berkeley sockets" is still the API's name |
| SVR4 / TLI / XTI | System V's alternative transport interface — historical, lost to sockets |
Practical meaning: code written to POSIX sockets compiles on Linux, the BSDs, macOS, Solaris — and with the Winsock variant, Windows.
1.3 TCP vs UDP — the two transports you program against
| Property | TCP | UDP |
|---|---|---|
| Type | connection-oriented byte stream | connectionless datagrams |
| Reliability | acknowledgements, retransmission, no loss/dup | none — "best effort" |
| Ordering | guaranteed in-order | may reorder |
| Flow control | sliding window | none |
| Congestion control | yes | no |
| Message boundaries | not preserved (a stream!) | preserved (one sendto = one datagram) |
| Typical uses | HTTP, SSH, mail, file transfer | DNS, DHCP, SNMP, NTP, voice/video, games |
Two stream facts that bite programmers: a single write may arrive as several reads (and vice versa) — applications must impose their own message framing; and TCP is full-duplex — data flows both directions independently.
1.4 TCP connection establishment & termination (programmer's view)
Three-way handshake — and which socket calls trigger what:
Termination takes four segments (each direction closes independently): FIN → ACK → FIN → ACK. The end that closes first enters TIME_WAIT and stays there for 2·MSL (twice the maximum segment lifetime, typically 1–4 minutes total). Why TIME_WAIT exists — two reasons examiners want verbatim:
- to retransmit the final ACK if it is lost (reliable full-duplex close), and
- to let old duplicate segments die before the same (IP, port) pair is reused — protecting a new incarnation of the connection from ghosts of the old one.
TCP states you must recognise in netstat: LISTEN, SYN_SENT, SYN_RCVD, ESTABLISHED, FIN_WAIT_1/2, CLOSE_WAIT, LAST_ACK, TIME_WAIT, CLOSED.
Segment format essentials: source/destination ports (16 bits each), sequence & acknowledgement numbers (32 bits), flags (SYN, ACK, FIN, RST, PSH, URG), window (16 bits — flow control), checksum. UDP's header is just 8 bytes: ports, length, checksum.
1.5 Buffer sizes and limitations (frequent short-answer question)
| Quantity | Typical value / limit |
|---|---|
| IPv4 datagram max | 65,535 bytes (16-bit total-length field) |
| IPv4 minimum reassembly buffer | 576 bytes — the safe upper bound for application UDP datagrams |
| Ethernet MTU | 1500 bytes |
| MSS (max segment size) | MTU − 40 (IPv4: 20 IP + 20 TCP) ≈ 1460 |
| TCP send/receive socket buffers | kernel-dependent (e.g. 64 KB+), tunable via SO_SNDBUF / SO_RCVBUF |
| Path MTU | smallest MTU along the route; discovered via DF bit + ICMP |
A datagram larger than the MTU is fragmented by IP; TCP avoids fragmentation by never sending more than the MSS; UDP applications must size datagrams themselves — that's why DNS classically capped answers at 512 bytes.
1.6 Standard Internet services & protocol usage
Classic simple services (historically served by inetd — Unit 3): echo (7), discard (9), daytime (13), chargen (19), time (37). Well-known ports live in /etc/services: FTP 21, SSH 22, telnet 23, SMTP 25, DNS 53, HTTP 80, POP3 110, NTP 123, HTTPS 443. Port ranges: 0–1023 well-known (privileged), 1024–49151 registered, 49152–65535 ephemeral (the kernel assigns these to clients).
Protocol choice by application (the "protocol usage by common internet applications" table):
| Application | Transport | Why |
|---|---|---|
| HTTP/HTTPS, FTP, SMTP, SSH | TCP | needs reliability & streams |
| DNS | UDP (TCP fallback for big answers/zone transfers) | small request/response |
| DHCP, SNMP, TFTP | UDP | simplicity, broadcast needs |
| NTP/SNTP | UDP | timestamps hate retransmission delays |
| Voice/video (RTP), games | UDP | timeliness over reliability |
| Routing: RIP uses UDP, BGP uses TCP, OSPF rides raw IP | mixed | fit for purpose |
1.7 The TCP state transition diagram — a guided walkthrough
The state diagram is the single most-asked figure from this unit. Don't memorise it as a picture; memorise it as two journeys through the states:
Journey 1 — the client (active open):
CLOSED --connect(): send SYN--> SYN_SENT --recv SYN+ACK, send ACK--> ESTABLISHED
ESTABLISHED --close(): send FIN--> FIN_WAIT_1 --recv ACK--> FIN_WAIT_2
FIN_WAIT_2 --recv FIN, send ACK--> TIME_WAIT --2·MSL timer--> CLOSED
Journey 2 — the server (passive open):
CLOSED --listen()--> LISTEN --recv SYN, send SYN+ACK--> SYN_RCVD
SYN_RCVD --recv ACK--> ESTABLISHED
ESTABLISHED --recv FIN, send ACK--> CLOSE_WAIT (application still has the socket open!)
CLOSE_WAIT --close(): send FIN--> LAST_ACK --recv ACK--> CLOSED
Three observations that turn a description into a full-marks answer:
- The active closer takes the TIME_WAIT branch; the passive closer takes the CLOSE_WAIT → LAST_ACK branch. Whichever end calls close first pays the 2·MSL penalty — which is why well-designed protocols often arrange for the client to close first (servers can't afford thousands of TIME_WAIT slots... though busy web servers suffer exactly this).
- **CLOSE_WAIT means "the peer has finished; your application hasn't called close yet".** A server stuck with growing CLOSE_WAIT counts in
netstathas a descriptor leak — it never noticed read() returning 0. This is a real-world debugging heuristic worth quoting. - Simultaneous open and simultaneous close exist (both ends SYN, or both FIN, at once) — rare paths through SYN_RCVD and CLOSING, worth one sentence in a long answer to show you've seen the full diagram.
1.8 Flow control vs congestion control — don't conflate them
Students lose marks by treating these as synonyms. They solve different problems with different mechanisms:
| Aspect | Flow control | Congestion control |
|---|---|---|
| Protects | the receiver (its buffer) | the network (router queues) |
| Signal | advertised window field in every TCP header | inferred: packet loss (timeout, dup ACKs), delay |
| Mechanism | sliding window — sender may have ≤ window unACKed bytes | cwnd: slow start, congestion avoidance (AIMD), fast retransmit/recovery |
| Who states the limit | receiver states it explicitly | sender estimates it |
| Exists in UDP? | no | no |
The sender's real transmission limit is min(advertised window, congestion window). The sliding window itself is worth narrating once: bytes to the left of the window are sent-and-ACKed; bytes inside are sent-unACKed or sendable; bytes to the right must wait. As ACKs arrive the left edge slides right — hence the name. If the receiver's buffer fills, it advertises window 0 and the sender stalls (probing occasionally with window probes — so a single lost window-update can't deadlock the connection).
1.9 Choosing a transport — the decision table
When the exam says "justify the choice of transport for application X", run this checklist:
| If the application needs... | Choose | Because |
|---|---|---|
| every byte, in order, eventually | TCP | reliability is the whole product |
| request/response of one small message | UDP (+ app retry) | a handshake costs 1 RTT before byte one; UDP completes in that time |
| broadcast or multicast delivery | UDP | TCP is strictly point-to-point |
| smooth real-time flow, late data useless | UDP (RTP on top) | retransmitted audio arrives too late to play |
| thousands of tiny clients on one server | often UDP | no per-connection kernel state |
| message boundaries preserved | UDP (or framing over TCP) | TCP is a stream |
Exam pointers
- "Explain the TCP three-way handshake and connection termination with a state diagram" — draw the two journeys above; explicitly say four segments for termination and why TIME_WAIT lasts 2·MSL (lost-final-ACK + old-duplicates: both reasons, verbatim).
- "Differentiate TCP and UDP" — the §1.3 table is the answer; add the stream-boundary warning as your closing sentence.
- Short-answer favourites: define MSS and relate it to MTU (MSS = MTU − 40 for IPv4); why 576 bytes matters for UDP; which field implements flow control (the 16-bit window).
Check yourself
- Which end of a TCP connection enters TIME_WAIT, and what two failures would occur if the state were skipped?
- A socket sits in CLOSE_WAIT for hours. What did the remote end do, and what bug does the local application have?
- State the sender's effective window when the receiver advertises 32 KB but cwnd is 8 KB.
- Why does DNS use UDP for queries but TCP for zone transfers?
- One write() of 4000 bytes is read by the peer as 2920 + 1080 bytes. Is this a bug? Which §1.3 property explains it?