1.1 The echo pair — the course's laboratory animal
The server echoes back every line a client sends. Server child logic and client loop:
/* server: per-connection work */
void str_echo(int sockfd) {
ssize_t n; char buf[MAXLINE];
again:
while ((n = read(sockfd, buf, MAXLINE)) > 0)
writen(sockfd, buf, n);
if (n < 0 && errno == EINTR) goto again; /* interrupted → retry */
else if (n < 0) err_sys("str_echo: read error");
/* n == 0: client closed → return, child exits */
}
/* client: read stdin → socket → read socket → stdout */
void str_cli(FILE *fp, int sockfd) {
char send[MAXLINE], recv[MAXLINE];
while (fgets(send, MAXLINE, fp) != NULL) {
writen(sockfd, send, strlen(send));
if (readline(sockfd, recv, MAXLINE) == 0)
err_quit("str_cli: server terminated prematurely");
fputs(recv, stdout);
}
}
Normal startup: server socket→bind→listen→accept (blocks); client socket→connect; handshake completes; client blocks in fgets. Normal termination: user types EOF (Ctrl-D) → fgets returns NULL → client exit → kernel closes the socket (FIN) → server's read returns 0 → child exits → kernel sends SIGCHLD to the parent.
1.2 POSIX signals & handling SIGCHLD
A signal is an asynchronous notification. Install handlers with sigaction (the portable signal wrapper). Key rules: a child that exits becomes a zombie until its parent calls wait/waitpid; an unhandled pile of zombies eventually exhausts the process table — every forking server must reap children:
void sig_chld(int signo) {
pid_t pid; int stat;
while ((pid = waitpid(-1, &stat, WNOHANG)) > 0) /* loop! */
;
}
signal(SIGCHLD, sig_chld); /* install before the accept loop */
Why waitpid with WNOHANG in a loop, not plain wait: signals are not queued — if five children die while one SIGCHLD is pending, the handler runs once; the loop reaps all five, WNOHANG keeps it from blocking. A guaranteed exam question.
Interrupted system calls: a caught signal makes slow calls (accept, read) return −1 with errno == EINTR — servers must restart them (the goto again above, or SA_RESTART).
1.3 The failure-mode catalogue (the heart of this lesson)
What does the client experience when things go wrong? Five scenarios, five different behaviours:
**① Server child process crashes (kill the child):** kernel sends FIN; client's readline could detect EOF — but the client is blocked in fgets! It sends the next line to a half-closed connection; server responds RST; client's readline then returns 0 → "server terminated prematurely". Lesson: the client is blocked on the wrong descriptor — motivates select (next lesson).
② Client writes to a socket that has received RST: the second write after an RST delivers SIGPIPE, which kills the process by default (write returns EPIPE if the signal is ignored). Servers/clients that mustn't die: signal(SIGPIPE, SIG_IGN).
**③ Server host crashes (unplug it): nothing is sent. The client's write succeeds locally; TCP retransmits ~12 times over ~9 minutes; readline finally fails with ETIMEDOUT**. Detection requires either application timeouts or SO_KEEPALIVE.
④ Server host crashes and reboots: the rebooted kernel knows nothing of old connections → responds to the client's retransmitted data with RST → client gets ECONNRESET.
⑤ Server host shutdown (orderly): init sends SIGTERM/SIGKILL; processes die; kernel closes descriptors → FIN — looks like scenario ① to the client.
| Scenario | What client receives | errno / symptom |
|---|---|---|
| child crashes | FIN, then RST on next write | readline → 0; SIGPIPE on 2nd write |
| host crashes | nothing | ETIMEDOUT after ~9 min |
| host reboots | RST on retransmission | ECONNRESET |
| host unreachable | ICMP errors | EHOSTUNREACH |
The moral, quoted in every textbook treatment: a TCP endpoint learns nothing about the peer unless it sends data (or uses keep-alives) — silence is indistinguishable from a dead peer.
1.4 RST vs FIN — the two ways a connection ends
Half of the catalogue above turns on this distinction, so make it explicit:
| Property | FIN (orderly release) | RST (abortive release) |
|---|---|---|
| Meaning | "I have no more data to send" | "this connection does not exist / is being destroyed" |
| Triggered by | close(), shutdown(SHUT_WR), process exit | SYN to a closed port; data to a dead connection; SO_LINGER(on,0) close; closing with unread data queued |
| Buffered data | delivered first — FIN queues behind data | discarded in both directions immediately |
| Peer's read() sees | 0 (EOF) after draining data | −1 with ECONNRESET |
| Acknowledged? | yes — FIN consumes a sequence number | no — RST is never ACKed, never retransmitted |
| Half-close possible | yes (the other direction lives on) | no — everything dies |
The detail that distinguishes a top answer: a FIN means only "no more data from me" — it is a half-close, and the receiving application learns about it as a normal EOF, not an error. An RST is the kernel disavowing the connection entirely; nothing about it is graceful.
1.5 SIGPIPE, slowly
The scenario-② rule deserves its precise statement, because the asymmetry is the exam point:
- First write after the peer's RST: the write succeeds locally (data enters the send buffer); the RST comes back; the kernel marks the socket dead.
- Second write: the kernel refuses — it raises SIGPIPE, whose default action terminates the process. If the signal is ignored or handled, write instead returns −1 with
EPIPE.
Why a signal at all, instead of just an error? Heritage: for a shell pipeline like cmd1 | cmd2, when cmd2 exits, SIGPIPE silently kills cmd1 — the right default for pipes, a booby trap for servers. Hence the standard first line of robust network programs:
signal(SIGPIPE, SIG_IGN); /* turn process-killing signal into EPIPE return */
A server that mysteriously vanishes with no log output when clients disconnect mid-transfer is the textbook SIGPIPE symptom — worth quoting as the "how would you debug..." answer.
1.6 The crashed-server timeline, end to end
A worked timeline for scenario ③ (host crash) ties the whole catalogue together — reproduce it when the question says "describe with a timeline":
t=0 client: writen("hello\n") — succeeds (local buffer); blocks in readline
t=0+ client TCP transmits the segment ... no ACK (host is dead, sends nothing)
t≈1.5s 1st retransmission ┐
t≈3s 2nd │ exponential backoff,
t≈6s 3rd │ ~12 attempts
... ┘
t≈9min TCP gives up → readline returns −1, errno = ETIMEDOUT
If a router knows the host is gone it may answer the retransmissions with ICMP host-unreachable → the error becomes EHOSTUNREACH instead. And the contrast to memorise: had the host rebooted meanwhile, the retransmitted data would meet a kernel with no record of the 4-tuple → RST → ECONNRESET much sooner. Three endings, three errnos, one experiment.
1.7 Why the parent must reap — zombies, quantified
Each unreaped child holds a process-table slot storing its exit status (visible as <defunct> / Z state in ps). A server handling one client per second leaks 86,400 zombies a day — well past typical process limits — and then fork itself fails with EAGAIN: the server is dead even though the parent never crashed. The SIGCHLD + waitpid(WNOHANG) loop from §1.2 is therefore not hygiene but survival. Also say where the handler interacts with the main loop: the signal interrupts accept → EINTR → the loop must restart accept (the second half of the same exam answer).
Exam pointers
- "Explain what happens when the server process / server host crashes" — give the FIN-vs-nothing distinction first, then the per-scenario errno; close with the "silence is indistinguishable" moral. The five-scenario table in §1.3 is the structure examiners mark against.
- "What is SIGPIPE and when is it generated?" — the first-write/second-write asymmetry is the answer; mention SIG_IGN + EPIPE.
- "Why waitpid with WNOHANG in a loop rather than wait?" — signals aren't queued; loop reaps all; WNOHANG avoids blocking when none remain. Three clauses, three marks.
Check yourself
- read() returns 0 vs read() returns −1 with ECONNRESET — what arrived on the wire in each case?
- Why does the client in scenario ① not learn about the server child's death until it sends the next line?
- Your server dies silently whenever a client closes its laptop mid-download. Name the signal and the one-line fix.
- Which scenarios in the catalogue would SO_KEEPALIVE detect, and roughly how long would detection take by default?
- After "kill -9" of the server child, which end is in FIN_WAIT_2 and which in CLOSE_WAIT? (Careful — kill -9 still produces an orderly FIN. Why?)