Transport & ReliabilityTopic 6 of 7

Transport & ReliabilityAdvanced

Socket Programming Mental Model

TCPUDPUnix Domain Sockets

Sockets are file descriptors that let applications send and receive data over networks — understanding them is understanding how all networking actually works.

The Problem

How does an application communicate over a network at the operating system level, and how does it scale from handling one connection to handling millions?

Mental Model

Like setting up a telephone switchboard — plug in lines (bind), listen for incoming calls (listen), accept them (accept), route conversations (read/write), and hang up (close).

Architecture Diagram

How It Works

Every networked application — from nginx handling 100K connections to a Python script fetching a URL — uses sockets. A socket is the operating system's abstraction for a network endpoint. It's a file descriptor (an integer) that can be read from and written to, just like a file. The kernel handles all the TCP/IP complexity behind this simple interface.

The Server Side: bind → listen → accept

Here's what actually happens inside the kernel when a server starts:

# Pseudocode for a TCP server
fd = socket(AF_INET, SOCK_STREAM, 0)  # Create a TCP socket
# Returns: file descriptor (e.g., 3)
# Kernel allocates: socket structure, send/receive buffers

setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, 1)
# Allows bind() to succeed even if the port is in TIME_WAIT

bind(fd, ("0.0.0.0", 8080))
# Kernel registers: "port 8080 belongs to this socket"
# The socket is now associated with an address

listen(fd, 128)
# Kernel creates TWO queues:
#   1. SYN queue (half-open connections: SYN received, SYN-ACK sent)
#   2. Accept queue (fully established connections waiting for accept())
# The backlog argument (128) sets the accept queue size

client_fd, addr = accept(fd)
# Kernel dequeues one connection from the accept queue
# Returns: NEW file descriptor for this specific client
# The original fd (3) keeps listening for new connections

This is the critical mental model: accept() creates a new file descriptor. After accepting, there are two fds: the listening socket (which continues accepting) and the client socket (which carries data for this specific connection).

The Client Side: connect

# Pseudocode for a TCP client
fd = socket(AF_INET, SOCK_STREAM, 0)
connect(fd, ("93.184.216.34", 80))
# Kernel performs the 3-way handshake:
#   1. Sends SYN to the server
#   2. Receives SYN-ACK
#   3. Sends ACK
# connect() blocks until the handshake completes (or times out)

write(fd, b"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n")
response = read(fd, 4096)
close(fd)

The kernel automatically assigns an ephemeral port (e.g., 52431) to the client socket. The connection is uniquely identified by the 4-tuple: (client IP, client port, server IP, server port).

The Backlog Queue: What listen(fd, 128) Really Means

The backlog parameter is one of the most misunderstood concepts in socket programming. It does NOT limit concurrent connections. It limits the queue of completed connections waiting to be accept()ed.

                    ┌─────────────────┐
  SYN arrives →     │  SYN Queue       │  (half-open: SYN received, SYN-ACK sent)
                    │  (kernel-managed)│
                    └────────┬────────┘
                             │ ACK arrives (handshake complete)
                    ┌────────▼────────┐
                    │  Accept Queue    │  (fully established, waiting for accept())
                    │  size = backlog  │
                    └────────┬────────┘
                             │ accept() called by application
                    ┌────────▼────────┐
                    │  Application     │  (now reading/writing data)
                    └─────────────────┘

If the accept queue is full (the application is too slow to call accept()), new connections are dropped — the kernel sends TCP RST or simply ignores the final ACK, depending on configuration. Under a SYN flood attack, the SYN queue fills up, which is why SYN cookies exist (the kernel validates SYN-ACKs without storing state).

# Check listen queue on Linux
ss -tlnp
# Recv-Q: current queue depth
# Send-Q: maximum queue size (backlog)
# If Recv-Q is approaching Send-Q, the app can't accept() fast enough

# Check for overflows
nstat -az | grep -i listen
# TcpExtListenOverflows: connections dropped because accept queue was full
# TcpExtListenDrops: total connections dropped on listening sockets

Blocking vs Non-Blocking I/O

Blocking I/O (Thread-per-Connection)

The simplest model: one thread per client. Each thread calls read() and blocks until data arrives.

# Blocking server — one thread per connection
while True:
    client_fd = accept(listen_fd)        # Blocks until new connection
    thread = Thread(target=handle, args=(client_fd,))
    thread.start()

def handle(fd):
    while True:
        data = read(fd, 4096)            # Blocks until data arrives
        if not data: break
        write(fd, process(data))         # Blocks until write buffer has space
    close(fd)

This works fine for 100 connections. At 10,000 connections, the server has 10,000 threads. Each thread consumes ~1 MB of stack space (10 GB total), and the OS scheduler thrashes trying to context-switch between them. This is the C10K problem that motivated event-driven architectures.

Non-Blocking I/O with I/O Multiplexing

The solution: make sockets non-blocking and use a single thread to monitor thousands of them.

# Non-blocking server with epoll (Linux) — single thread, many connections
epoll_fd = epoll_create()
listen_fd = socket(...)
fcntl(listen_fd, F_SETFL, O_NONBLOCK)   # Make non-blocking
bind(listen_fd, addr)
listen(listen_fd, 128)
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, EPOLLIN)

while True:
    events = epoll_wait(epoll_fd, timeout=1000)  # Wait for ready fds
    for fd, event in events:
        if fd == listen_fd:
            client_fd = accept(listen_fd)         # Won't block
            fcntl(client_fd, F_SETFL, O_NONBLOCK)
            epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_fd, EPOLLIN)
        elif event & EPOLLIN:
            data = read(fd, 4096)                 # Won't block
            if data:
                process_and_respond(fd, data)
            else:
                epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd)
                close(fd)

This is how nginx works. A single worker process handles tens of thousands of connections because it never blocks — it only processes file descriptors that are ready.

The Evolution: select → poll → epoll → io_uring

Mechanism	Year	Scaling	How It Works
select()	1983	O(n), max 1024 fds	Passes a bitmap of fds to kernel, kernel scans all of them
poll()	1986	O(n), no fd limit	Passes an array of fds, kernel scans all of them
epoll (Linux)	2002	O(ready), no limit	Kernel maintains a set of fds, returns only ready ones
kqueue (BSD)	2000	O(ready), no limit	Like epoll but supports files, signals, timers too
io_uring	2019	O(1) amortized	Shared ring buffer between user/kernel, zero-copy, zero-syscall

select() is the ancient approach: the caller passes a bitmask of up to 1024 file descriptors to the kernel, and it returns which ones are ready. The kernel scans every fd on every call — O(n) where n is the total number of fds, not the number that are ready.

epoll changed everything. Interest in fds is registered once with epoll_ctl(), and epoll_wait() returns only the fds that are ready. With 100,000 connections but only 10 ready, epoll does O(10) work, not O(100,000).

# See epoll in action — trace nginx worker
strace -e epoll_wait -p $(pgrep nginx | head -1) 2>&1 | head -20

io_uring is the latest evolution (Linux 5.1+). It uses shared ring buffers between userspace and the kernel. The application submits I/O requests to a submission queue, and the kernel posts completions to a completion queue — no syscall needed for submission or completion polling. This eliminates syscall overhead entirely for high-throughput I/O.

The TCP Byte Stream Problem

A critical gotcha that bites every new socket programmer: TCP is a byte stream, not a message stream. If a sender writes two messages of 100 bytes each, the receiver might get:

One read() of 200 bytes (both messages concatenated)
Two reads of 100 bytes each (clean split — lucky)
Three reads: 50, 100, 50 bytes (split across message boundary)

This is by design. TCP provides a stream of bytes, like a pipe. It makes no guarantees about how bytes are grouped when read() returns.

The solution is message framing: a protocol layer that defines where messages start and end.

Approach 1: Length prefix
[4-byte length][payload bytes][4-byte length][payload bytes]

Approach 2: Delimiter
message content\r\n
another message\r\n

Approach 3: Fixed-size messages
[exactly 256 bytes per message, padded if needed]

HTTP uses a combination: a text header ending with \r\n\r\n, with a Content-Length header (or chunked encoding) specifying the body size. gRPC uses length-prefixed protobuf messages. Redis uses \r\n delimiters.

File Descriptor Limits

Every socket is a file descriptor. Linux has per-process and system-wide fd limits:

# Per-process limit (default often 1024!)
ulimit -n

# Increase for the current process
ulimit -n 65535

# System-wide limit
cat /proc/sys/fs/file-max

# See fd usage per process
ls /proc/$(pgrep nginx | head -1)/fd | wc -l

# Permanently increase limits in /etc/security/limits.conf
# nginx    soft    nofile    65535
# nginx    hard    nofile    65535

If the server hits the fd limit, accept() fails with EMFILE (too many open files) and new connections are refused. This is a common production issue that manifests as mysterious connection failures under load. Always set fd limits explicitly in the service configuration.

Why This Matters

Engineers rarely write raw socket code in production — frameworks and libraries handle it. But understanding the socket layer is essential for:

Debugging: When connections are refused, reset, or timing out, the answer is in socket state and kernel queues
Configuration: nginx's worker_connections, Node's server.maxConnections, and database pool sizes all map directly to socket concepts
Architecture: Choosing between thread-per-connection (simple but limited), event-driven (scalable but complex), and coroutine-based (Go, best of both) requires understanding the underlying I/O model
Performance: Knowing that epoll is O(ready) while select is O(total) explains why nginx handles 10x the connections of Apache's prefork model

The socket API is 40 years old and hasn't fundamentally changed. Every networking innovation — HTTP/2, gRPC, QUIC — ultimately creates sockets and calls read() and write(). Master this layer, and everything above it makes sense.

Key Points

•A socket is just a file descriptor — read(), write(), and close() work on it like any other file. This is the Unix 'everything is a file' philosophy applied to networking
•The listen() backlog is NOT the max concurrent connections — it's the queue of connections that have completed the 3-way handshake but haven't been accept()ed yet
•accept() returns a BRAND NEW file descriptor for each client connection. The original listening socket stays open, ready for the next client
•Blocking I/O means one thread per connection, which doesn't scale past ~10K connections. Non-blocking I/O with epoll/kqueue handles millions
•The C10K problem (handling 10,000 concurrent connections) was solved by moving from thread-per-connection to event-driven I/O — this is how nginx, Node.js, and Go's runtime work

Key Components

Component	Role
Socket File Descriptor	An integer handle returned by socket() that represents a network endpoint — everything in Unix is a file, including network connections
bind()	Associates a socket with a local IP address and port number, claiming that address for incoming connections
listen() + Backlog Queue	Marks a socket as passive (server) and sets the size of the queue for pending connections waiting to be accept()ed
accept()	Dequeues a completed TCP connection from the backlog and returns a NEW file descriptor for that specific client
I/O Multiplexing (epoll/kqueue)	Monitors thousands of file descriptors simultaneously, notifying the application only when data is ready — the foundation of event-driven servers

When to Use

Understanding socket programming is essential for debugging any networking issue, configuring servers, and understanding why frameworks behave the way they do. Raw socket code is rare in production, but the mental model is non-negotiable.

Tool Comparison

Tool	Type	Best For	Scale
epoll (Linux)	Open Source	High-performance I/O multiplexing on Linux — O(1) for ready events, handles millions of fds	Any
kqueue (BSD/macOS)	Open Source	I/O multiplexing on FreeBSD and macOS with unified event notification for sockets, files, signals, and timers	Any
io_uring (Linux 5.1+)	Open Source	Zero-copy, zero-syscall async I/O — the future of Linux networking for maximum throughput	Large-Enterprise
libuv	Open Source	Cross-platform async I/O library — powers Node.js, uses epoll/kqueue/IOCP under the hood	Any

Debug Checklist

Check open file descriptor count: ls /proc/PID/fd | wc -l or lsof -p PID | wc -l — approaching ulimit means the process is leaking sockets
Monitor listen queue overflow: ss -tlnp shows Recv-Q (pending connections) and Send-Q (backlog size) — if Recv-Q approaches Send-Q, increase the backlog
Verify socket options: ss -tlnp -o shows keepalive timers, SO_REUSEADDR, and other options on listening sockets
Check for CLOSE_WAIT accumulation: ss -tnp | grep CLOSE_WAIT — this means the remote side closed but the application didn't call close()
Trace socket syscalls: strace -e network -p PID shows every socket(), bind(), listen(), accept(), connect(), read(), write() call in real-time

Common Mistakes

Forgetting SO_REUSEADDR when restarting a server. Without it, bind() fails with 'Address already in use' because the old socket is in TIME_WAIT
Setting the listen backlog too small. Under burst traffic, new connections get dropped with TCP RST before accept() can process them
Assuming one read() returns one complete message. TCP is a byte stream — a single read() may return half a message or three messages concatenated
Blocking on accept() in a single-threaded server. While waiting for a new connection, existing clients can't be served — use I/O multiplexing
Not handling EINTR (interrupted system call). Signals can interrupt any blocking syscall — always retry on EINTR

Real World Usage

•nginx uses epoll (Linux) or kqueue (FreeBSD) to handle tens of thousands of concurrent connections in a single worker process
•Redis is single-threaded but handles 100K+ operations/second because it uses I/O multiplexing (ae library wrapping epoll/kqueue)
•Node.js event loop is built on libuv, which uses epoll/kqueue for non-blocking socket I/O — one thread serves thousands of requests
•Go's runtime uses non-blocking sockets with epoll/kqueue internally, but presents a blocking API to goroutines via its scheduler
•HAProxy uses multi-threaded epoll to handle millions of concurrent TCP connections with sub-millisecond latency

RFCs & Specs

RFC 793 — TCP (defines the socket state machine)RFC 768 — UDP (datagram sockets)POSIX.1 — Socket API specification (socket, bind, listen, accept, connect)RFC 6555 — Happy Eyeballs (client-side connect() with IPv4/IPv6 racing)

Socket Programming Mental Model

TCPUDPUnix Domain Sockets

Sockets are file descriptors that let applications send and receive data over networks — understanding them is understanding how all networking actually works.

The Problem

How does an application communicate over a network at the operating system level, and how does it scale from handling one connection to handling millions?

Mental Model

Like setting up a telephone switchboard — plug in lines (bind), listen for incoming calls (listen), accept them (accept), route conversations (read/write), and hang up (close).

Architecture Diagram

How It Works

The Server Side: bind → listen → accept

Here's what actually happens inside the kernel when a server starts:

# Pseudocode for a TCP server
fd = socket(AF_INET, SOCK_STREAM, 0)  # Create a TCP socket
# Returns: file descriptor (e.g., 3)
# Kernel allocates: socket structure, send/receive buffers

setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, 1)
# Allows bind() to succeed even if the port is in TIME_WAIT

bind(fd, ("0.0.0.0", 8080))
# Kernel registers: "port 8080 belongs to this socket"
# The socket is now associated with an address

listen(fd, 128)
# Kernel creates TWO queues:
#   1. SYN queue (half-open connections: SYN received, SYN-ACK sent)
#   2. Accept queue (fully established connections waiting for accept())
# The backlog argument (128) sets the accept queue size

client_fd, addr = accept(fd)
# Kernel dequeues one connection from the accept queue
# Returns: NEW file descriptor for this specific client
# The original fd (3) keeps listening for new connections

The Client Side: connect

# Pseudocode for a TCP client
fd = socket(AF_INET, SOCK_STREAM, 0)
connect(fd, ("93.184.216.34", 80))
# Kernel performs the 3-way handshake:
#   1. Sends SYN to the server
#   2. Receives SYN-ACK
#   3. Sends ACK
# connect() blocks until the handshake completes (or times out)

write(fd, b"GET / HTTP/1.1\r\nHost: example.com\r\n\r\n")
response = read(fd, 4096)
close(fd)

The kernel automatically assigns an ephemeral port (e.g., 52431) to the client socket. The connection is uniquely identified by the 4-tuple: (client IP, client port, server IP, server port).

The Backlog Queue: What listen(fd, 128) Really Means

                    ┌─────────────────┐
  SYN arrives →     │  SYN Queue       │  (half-open: SYN received, SYN-ACK sent)
                    │  (kernel-managed)│
                    └────────┬────────┘
                             │ ACK arrives (handshake complete)
                    ┌────────▼────────┐
                    │  Accept Queue    │  (fully established, waiting for accept())
                    │  size = backlog  │
                    └────────┬────────┘
                             │ accept() called by application
                    ┌────────▼────────┐
                    │  Application     │  (now reading/writing data)
                    └─────────────────┘

# Check listen queue on Linux
ss -tlnp
# Recv-Q: current queue depth
# Send-Q: maximum queue size (backlog)
# If Recv-Q is approaching Send-Q, the app can't accept() fast enough

# Check for overflows
nstat -az | grep -i listen
# TcpExtListenOverflows: connections dropped because accept queue was full
# TcpExtListenDrops: total connections dropped on listening sockets

Blocking vs Non-Blocking I/O

Blocking I/O (Thread-per-Connection)

The simplest model: one thread per client. Each thread calls read() and blocks until data arrives.

# Blocking server — one thread per connection
while True:
    client_fd = accept(listen_fd)        # Blocks until new connection
    thread = Thread(target=handle, args=(client_fd,))
    thread.start()

def handle(fd):
    while True:
        data = read(fd, 4096)            # Blocks until data arrives
        if not data: break
        write(fd, process(data))         # Blocks until write buffer has space
    close(fd)

Non-Blocking I/O with I/O Multiplexing

The solution: make sockets non-blocking and use a single thread to monitor thousands of them.

# Non-blocking server with epoll (Linux) — single thread, many connections
epoll_fd = epoll_create()
listen_fd = socket(...)
fcntl(listen_fd, F_SETFL, O_NONBLOCK)   # Make non-blocking
bind(listen_fd, addr)
listen(listen_fd, 128)
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, EPOLLIN)

while True:
    events = epoll_wait(epoll_fd, timeout=1000)  # Wait for ready fds
    for fd, event in events:
        if fd == listen_fd:
            client_fd = accept(listen_fd)         # Won't block
            fcntl(client_fd, F_SETFL, O_NONBLOCK)
            epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_fd, EPOLLIN)
        elif event & EPOLLIN:
            data = read(fd, 4096)                 # Won't block
            if data:
                process_and_respond(fd, data)
            else:
                epoll_ctl(epoll_fd, EPOLL_CTL_DEL, fd)
                close(fd)

This is how nginx works. A single worker process handles tens of thousands of connections because it never blocks — it only processes file descriptors that are ready.

The Evolution: select → poll → epoll → io_uring

Mechanism	Year	Scaling	How It Works
select()	1983	O(n), max 1024 fds	Passes a bitmap of fds to kernel, kernel scans all of them
poll()	1986	O(n), no fd limit	Passes an array of fds, kernel scans all of them
epoll (Linux)	2002	O(ready), no limit	Kernel maintains a set of fds, returns only ready ones
kqueue (BSD)	2000	O(ready), no limit	Like epoll but supports files, signals, timers too
io_uring	2019	O(1) amortized	Shared ring buffer between user/kernel, zero-copy, zero-syscall

# See epoll in action — trace nginx worker
strace -e epoll_wait -p $(pgrep nginx | head -1) 2>&1 | head -20

The TCP Byte Stream Problem

A critical gotcha that bites every new socket programmer: TCP is a byte stream, not a message stream. If a sender writes two messages of 100 bytes each, the receiver might get:

One read() of 200 bytes (both messages concatenated)
Two reads of 100 bytes each (clean split — lucky)
Three reads: 50, 100, 50 bytes (split across message boundary)

This is by design. TCP provides a stream of bytes, like a pipe. It makes no guarantees about how bytes are grouped when read() returns.

The solution is message framing: a protocol layer that defines where messages start and end.

Approach 1: Length prefix
[4-byte length][payload bytes][4-byte length][payload bytes]

Approach 2: Delimiter
message content\r\n
another message\r\n

Approach 3: Fixed-size messages
[exactly 256 bytes per message, padded if needed]

File Descriptor Limits

Every socket is a file descriptor. Linux has per-process and system-wide fd limits:

# Per-process limit (default often 1024!)
ulimit -n

# Increase for the current process
ulimit -n 65535

# System-wide limit
cat /proc/sys/fs/file-max

# See fd usage per process
ls /proc/$(pgrep nginx | head -1)/fd | wc -l

# Permanently increase limits in /etc/security/limits.conf
# nginx    soft    nofile    65535
# nginx    hard    nofile    65535

Why This Matters

Engineers rarely write raw socket code in production — frameworks and libraries handle it. But understanding the socket layer is essential for:

Debugging: When connections are refused, reset, or timing out, the answer is in socket state and kernel queues
Configuration: nginx's worker_connections, Node's server.maxConnections, and database pool sizes all map directly to socket concepts
Architecture: Choosing between thread-per-connection (simple but limited), event-driven (scalable but complex), and coroutine-based (Go, best of both) requires understanding the underlying I/O model
Performance: Knowing that epoll is O(ready) while select is O(total) explains why nginx handles 10x the connections of Apache's prefork model

Key Points

•A socket is just a file descriptor — read(), write(), and close() work on it like any other file. This is the Unix 'everything is a file' philosophy applied to networking
•The listen() backlog is NOT the max concurrent connections — it's the queue of connections that have completed the 3-way handshake but haven't been accept()ed yet
•accept() returns a BRAND NEW file descriptor for each client connection. The original listening socket stays open, ready for the next client
•Blocking I/O means one thread per connection, which doesn't scale past ~10K connections. Non-blocking I/O with epoll/kqueue handles millions
•The C10K problem (handling 10,000 concurrent connections) was solved by moving from thread-per-connection to event-driven I/O — this is how nginx, Node.js, and Go's runtime work

Key Components

Component	Role
Socket File Descriptor	An integer handle returned by socket() that represents a network endpoint — everything in Unix is a file, including network connections
bind()	Associates a socket with a local IP address and port number, claiming that address for incoming connections
listen() + Backlog Queue	Marks a socket as passive (server) and sets the size of the queue for pending connections waiting to be accept()ed
accept()	Dequeues a completed TCP connection from the backlog and returns a NEW file descriptor for that specific client
I/O Multiplexing (epoll/kqueue)	Monitors thousands of file descriptors simultaneously, notifying the application only when data is ready — the foundation of event-driven servers

When to Use

Tool Comparison

Tool	Type	Best For	Scale
epoll (Linux)	Open Source	High-performance I/O multiplexing on Linux — O(1) for ready events, handles millions of fds	Any
kqueue (BSD/macOS)	Open Source	I/O multiplexing on FreeBSD and macOS with unified event notification for sockets, files, signals, and timers	Any
io_uring (Linux 5.1+)	Open Source	Zero-copy, zero-syscall async I/O — the future of Linux networking for maximum throughput	Large-Enterprise
libuv	Open Source	Cross-platform async I/O library — powers Node.js, uses epoll/kqueue/IOCP under the hood	Any

Debug Checklist

Check open file descriptor count: ls /proc/PID/fd | wc -l or lsof -p PID | wc -l — approaching ulimit means the process is leaking sockets
Monitor listen queue overflow: ss -tlnp shows Recv-Q (pending connections) and Send-Q (backlog size) — if Recv-Q approaches Send-Q, increase the backlog
Verify socket options: ss -tlnp -o shows keepalive timers, SO_REUSEADDR, and other options on listening sockets
Check for CLOSE_WAIT accumulation: ss -tnp | grep CLOSE_WAIT — this means the remote side closed but the application didn't call close()
Trace socket syscalls: strace -e network -p PID shows every socket(), bind(), listen(), accept(), connect(), read(), write() call in real-time

Common Mistakes

Forgetting SO_REUSEADDR when restarting a server. Without it, bind() fails with 'Address already in use' because the old socket is in TIME_WAIT
Setting the listen backlog too small. Under burst traffic, new connections get dropped with TCP RST before accept() can process them
Assuming one read() returns one complete message. TCP is a byte stream — a single read() may return half a message or three messages concatenated
Blocking on accept() in a single-threaded server. While waiting for a new connection, existing clients can't be served — use I/O multiplexing
Not handling EINTR (interrupted system call). Signals can interrupt any blocking syscall — always retry on EINTR

Real World Usage

•nginx uses epoll (Linux) or kqueue (FreeBSD) to handle tens of thousands of concurrent connections in a single worker process
•Redis is single-threaded but handles 100K+ operations/second because it uses I/O multiplexing (ae library wrapping epoll/kqueue)
•Node.js event loop is built on libuv, which uses epoll/kqueue for non-blocking socket I/O — one thread serves thousands of requests
•Go's runtime uses non-blocking sockets with epoll/kqueue internally, but presents a blocking API to goroutines via its scheduler
•HAProxy uses multi-threaded epoll to handle millions of concurrent TCP connections with sub-millisecond latency

The Problem

Mental Model

Architecture Diagram

How It Works

The Server Side: bind → listen → accept

The Client Side: connect

The Backlog Queue: What listen(fd, 128) Really Means

Blocking vs Non-Blocking I/O

Blocking I/O (Thread-per-Connection)

Non-Blocking I/O with I/O Multiplexing

The Evolution: select → poll → epoll → io_uring

The TCP Byte Stream Problem

File Descriptor Limits

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

The Server Side: bind → listen → accept

The Client Side: connect

The Backlog Queue: What listen(fd, 128) Really Means

Blocking vs Non-Blocking I/O

Blocking I/O (Thread-per-Connection)

Non-Blocking I/O with I/O Multiplexing

The Evolution: select → poll → epoll → io_uring

The TCP Byte Stream Problem

File Descriptor Limits

Why This Matters

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics