1. The problem
TCP is a byte-stream protocol, which means it does not preserve message boundaries.
- Fragmentation: If we ask the kernel to send 100 bytes of data, it might send them as a single packet, or it might break them into two packets of 50 bytes each (or any other combination).
- The Issue: When we read from a TCP socket, a single
read()call might return fewer bytes than we requested. This is not an error; it’s normal behavior. It just means the data is arriving in pieces.
The Solution: readn and writen
To handle this, we must write code that loops until all the data we expect has been read or written. The authors provide two custom functions to simplify this:
readn(fd, vptr, n)
- Purpose: Reads exactly
nbytes from a descriptorfd. - How it works: It calls
read()inside awhileloop.- If
read()returns fewer bytes than requested (a “short read”),readnloops again to read the remaining bytes. - It only returns when:
- All
nbytes have been read. read()returns0(EOF, meaning the connection was closed).read()returns-1(an error occurred).
- All
- If
writen(fd, vptr, n)
- Purpose: Writes exactly
nbytes to a descriptorfd. - How it works: It calls
write()inside awhileloop.- Similar to reading, a
write()call can return fewer bytes than requested (e.g., if the socket buffer is full). writenloops until all bytes are successfully handed to the kernel.
- Similar to reading, a
3. The readline function
Many network protocols (like HTTP, SMTP, and FTP) are text-based and process data one “line” at a time (terminating with a newline character \n).
- The Challenge: we can’t just ask the kernel to “read a line” because the kernel doesn’t know what a “line” is; it just sees a stream of bytes.
- The Slow Way: we could read 1 byte at a time in a loop, checking for
\n. This works but is incredibly slow because every single byte requires a system call (context switch). - The Fast Way (Buffered): The authors’
readlinefunction uses an internal buffer.- It reads a large chunk of data (e.g., 4096 bytes) from the socket into a static buffer once.
- It then hands we data byte-by-byte from this internal buffer.
- This minimizes expensive system calls while still allowing we to write code that looks like it’s reading character-by-character.
- Warning: Because
readlineuses a static buffer, it is not thread-safe. (Later in Chapter 26, a thread-safe version is introduced).