Real-Time & StreamingTopic 4 of 4

Real-Time & StreamingAdvanced

WebRTC — Peer-to-Peer

WebRTCSTUNTURNICESDPSRTPDTLS

WebRTC negotiates a direct encrypted path between browsers using signaling + ICE, falling back to a TURN relay when peer-to-peer fails.

The Problem

Real-time audio and video communication over the internet requires sub-200ms latency, NAT traversal, codec negotiation, encryption, and adaptation to changing network conditions. Traditional server-relayed architectures add latency and bandwidth cost. WebRTC solves this by establishing direct peer-to-peer connections with built-in encryption and congestion control.

Mental Model

Like two people trying to have a direct conversation across a crowded room — first they need a mutual friend (signaling server) to introduce them, then they find the best path to talk directly. If there is a wall between them (NAT/firewall), they either find a window (STUN) or ask someone to relay messages (TURN).

Architecture Diagram

How It Works

WebRTC is the most complex protocol stack in web development. It is not one protocol — it is an orchestra of protocols working together to establish a direct, encrypted, real-time media connection between two browsers. Understanding the full lifecycle is essential before trying to build anything with it.

The Connection Lifecycle

Every WebRTC connection goes through five phases:

Phase 1: Signaling (Application Responsibility)

WebRTC deliberately does not define how peers discover each other. A signaling mechanism — typically a WebSocket server — is needed to exchange two pieces of information:

SDP (Session Description Protocol): Describes what each peer can send/receive — codecs, media types, encryption fingerprints. Peer A creates an "offer," Peer B responds with an "answer."
ICE Candidates: Network addresses (IP:port pairs) where each peer can be reached.

// Peer A: Create offer
const peerConnection = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com', username: 'user', credential: 'pass' }
  ]
});

const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);

// Send offer to Peer B via the signaling server
signalingServer.send({ type: 'offer', sdp: offer });

Phase 2: ICE Candidate Gathering

Once the local description is set, the browser starts discovering its own network addresses:

Host candidates: Local IP addresses (192.168.1.x) — works when peers are on the same LAN.
Server-reflexive candidates: Public IP:port as seen by a STUN server — works when at least one side has a permissive NAT.
Relay candidates: Allocated address on a TURN server — always works, but adds latency and server cost.

peerConnection.onicecandidate = (event) => {
  if (event.candidate) {
    // Send each candidate to the remote peer via signaling
    signalingServer.send({ type: 'candidate', candidate: event.candidate });
  }
};

Phase 3: ICE Connectivity Checks

ICE pairs up local and remote candidates and tests connectivity by sending STUN binding requests. It tries all pairs in priority order — direct connections first, then STUN-derived, then TURN relay. The first pair that gets a response wins.

This is the magic of ICE: it tries every possible network path in parallel and picks the best one that actually works.

Phase 4: DTLS Handshake + SRTP Key Exchange

Once ICE finds a working path, DTLS (Datagram TLS) runs over that path to establish encryption keys. These keys are then used to encrypt media via SRTP (Secure Real-Time Protocol). This is not optional — WebRTC mandates encryption.

Phase 5: Media and Data Flow

Audio and video flow as SRTP packets directly between peers. Data channels use SCTP over DTLS for reliable or unreliable data delivery. The connection continuously monitors quality and adapts — reducing video resolution if bandwidth drops, switching ICE candidates if the network path changes.

NAT Traversal: The Hard Part

The reason WebRTC needs STUN and TURN is that most devices sit behind NAT (Network Address Translation). A laptop's IP might be 192.168.1.50, but the internet sees the router's IP 203.0.113.10. For a direct connection, both peers need to know each other's public IP and port.

STUN solves the easy case. The browser sends a packet to a STUN server, which replies with the public address: "203.0.113.10:54321." Now the peer knows its public-facing address and can share it with the other peer.

This works for most NAT types (full cone, restricted cone, port-restricted cone). But symmetric NATs assign a different external port for every destination. The port STUN discovers is only valid for talking to the STUN server, not the peer. About 20% of connections hit this problem.

TURN solves the hard case. The TURN server allocates a public address and relays all traffic. Both peers talk to the TURN server, which forwards packets between them. It always works, but every byte of media flows through the relay server — which is expensive.

Without TURN (80% of cases):
Peer A ←——direct P2P——→ Peer B

With TURN (20% of cases):
Peer A ←——→ TURN Server ←——→ Peer B

This is why TURN infrastructure is critical for production WebRTC. Skipping it means 1 in 5 users cannot connect.

Scaling Beyond Two Peers

WebRTC's peer-to-peer model works beautifully for 1-to-1 calls. But group calls expose a fundamental scaling problem: full mesh topology.

In a 4-person call with full mesh, each participant maintains 3 peer connections, sending their audio+video to each. That is 3 upload streams per person. In a 10-person call, it is 9 upload streams. A typical laptop's CPU and upload bandwidth cannot handle this.

Three architectures solve this:

Mesh (2-4 participants)

Every peer connects to every other peer. Simple, but upload bandwidth scales as O(n-1).

SFU — Selective Forwarding Unit (4-50 participants)

Each participant sends one stream to a central server. The SFU selectively forwards streams to each participant based on who is speaking, screen layout, and bandwidth. This is what Google Meet, Zoom, and most production systems use.

Peer A ──video──→ SFU ──→ Peer B (forwarded)
Peer B ──video──→ SFU ──→ Peer A (forwarded)
Peer C ──video──→ SFU ──→ Peer A, Peer B (forwarded)

The SFU does not transcode — it just forwards packets. This keeps CPU usage low while centralizing the routing decision.

MCU — Multipoint Control Unit (legacy)

The MCU receives all streams, mixes them into a single composite stream, and sends that back. This uses massive server CPU but minimizes client bandwidth. Rarely used in modern systems — SFUs won.

Data Channels: The Hidden Gem

Most engineers associate WebRTC with video calls, but RTCDataChannel might be the more interesting API. It provides a direct, encrypted, peer-to-peer data pipe between browsers.

const dataChannel = peerConnection.createDataChannel('files', {
  ordered: true,      // Guarantee order (like TCP)
  // ordered: false,  // No ordering (like UDP, lower latency)
});

dataChannel.onopen = () => {
  dataChannel.send('Hello from peer A!');
  // Or send binary data
  dataChannel.send(fileArrayBuffer);
};

dataChannel.onmessage = (event) => {
  console.log('Received:', event.data);
};

Data channels support both reliable (ordered, retransmitted) and unreliable (unordered, no retransmit) modes. This makes them viable for:

Peer-to-peer file transfer: ShareDrop, Snapdrop
Multiplayer gaming: Low-latency game state sync
Collaborative editing: Conflict-free replicated data types (CRDTs) synced directly between clients
Decentralized applications: No server needed for data exchange

Production Considerations

Codec negotiation matters. VP8 is universally supported but aging. VP9 is better quality at lower bitrate. H.264 has hardware acceleration on most devices. AV1 is the future but adoption is still growing. The SDP configuration determines which codecs are offered and preferred.

Bandwidth estimation is automatic but imperfect. WebRTC's built-in congestion control (GCC — Google Congestion Control) adapts to available bandwidth, but it can be slow to ramp up and aggressive in ramping down. Monitor RTCStatsReport to track bitrate, packet loss, and jitter.

ICE restart is essential for mobile. When a phone switches from WiFi to cellular (or vice versa), the ICE candidates change. Without calling peerConnection.restartIce(), the connection dies. Detect network changes with the Navigator.connection API and trigger ICE restart proactively.

Firewall traversal in enterprise. Corporate firewalls often block UDP entirely. WebRTC can fall back to TURN over TCP (port 443), which looks like HTTPS traffic. Configure the TURN server to listen on 443/TCP as a last resort — it adds latency but gets through almost any firewall.

WebRTC is complex because real-time communication over the open internet is genuinely hard. But once the signaling → ICE → DTLS → media pipeline is understood, the complexity becomes manageable. Start with a 1-to-1 call, get that working end-to-end, then tackle group calls with an SFU.

Key Points

•WebRTC establishes direct peer-to-peer connections between browsers, bypassing the server for media delivery — reducing latency and server bandwidth costs.
•Signaling is not part of the WebRTC spec. The application must provide its own mechanism (WebSocket, HTTP polling, even copy-pasting SDP) to exchange connection metadata.
•ICE tries multiple connection paths simultaneously: host candidates (local IP), server-reflexive (STUN-discovered public IP), and relay (TURN). It picks the best one that works.
•About 80% of WebRTC connections succeed peer-to-peer via STUN. The remaining 20% — behind symmetric NATs or restrictive firewalls — need a TURN relay server.
•WebRTC encrypts everything by default. DTLS secures the key exchange, SRTP encrypts media, and there is no option to disable encryption — it is mandatory in the spec.

Key Components

Component	Role
Signaling Server	Exchanges SDP offers/answers and ICE candidates between peers — not part of WebRTC spec, the transport is up to the application (WebSocket, HTTP, etc.)
ICE Framework	Interactive Connectivity Establishment — discovers the best network path between peers by testing STUN candidates and TURN relays
STUN Server	Tells a peer its public IP and port as seen from outside the NAT, enabling direct peer-to-peer connections
TURN Server	Media relay for when direct connection fails — routes all traffic through the server, used in ~20% of real-world connections
RTCPeerConnection	Core browser API that manages the peer connection lifecycle, ICE negotiation, codec selection, and media/data transport

When to Use

Use WebRTC for real-time audio/video calls, peer-to-peer data transfer, or any scenario requiring sub-second latency between browsers. Do not use it for server-to-client streaming (use SSE or HLS), large group broadcasts (use an SFU or CDN), or anything where server-side control of the media pipeline is critical.

Tool Comparison

Tool	Type	Best For	Scale
Twilio	Managed	Production-ready video/voice APIs with global TURN infrastructure and recording	Small-Enterprise
LiveKit	Open Source	Open-source SFU with room-based video conferencing and well-maintained client SDKs	Medium-Enterprise
Janus	Open Source	Lightweight, plugin-based WebRTC gateway for custom media routing pipelines	Medium-Large
mediasoup	Open Source	Node.js-based SFU library for building custom video conferencing architectures	Medium-Enterprise

Debug Checklist

Check ICE connection state — if stuck at 'checking', the peers cannot find a path. Verify STUN/TURN server reachability.
Inspect SDP offer/answer in chrome://webrtc-internals — confirm codec negotiation succeeded and both peers agree on media formats.
Verify TURN server credentials and connectivity. Use trickle ICE test (webrtc.github.io/samples/src/content/peerconnection/trickle-ice) to validate.
Monitor chrome://webrtc-internals stats — check packets sent/received, round-trip time, and bandwidth estimation graphs.
Test on restrictive networks (corporate VPNs, hotel WiFi) where symmetric NATs block STUN. These environments require TURN.

Common Mistakes

Forgetting to deploy a TURN server. STUN alone fails for ~20% of users behind symmetric NATs. Without TURN, those users simply cannot connect.
Using a public TURN server in production. TURN relays significant bandwidth — this demands dedicated infrastructure or a paid service with capacity planning.
Assuming WebRTC scales like a regular server. Each peer-to-peer connection is point-to-point. A 10-person call requires 9 connections per peer (full mesh), which destroys bandwidth.
Not implementing a Selective Forwarding Unit (SFU) for group calls. Beyond 3-4 participants, full mesh is impractical — a media server is required.
Ignoring ICE restart. When a user switches from WiFi to cellular, the ICE candidates change. Without ICE restart, the call drops.

Real World Usage

•Google Meet uses WebRTC with an SFU architecture — each participant sends one stream to the server, which selectively forwards streams to other participants.
•Discord uses WebRTC for voice channels, with their own media server infrastructure handling thousands of concurrent voice connections per server.
•Zoom uses WebRTC for their browser client, though their desktop client uses a custom protocol for additional optimizations.
•Twilio built their entire real-time communication platform on WebRTC, providing APIs for voice, video, and data channels.
•Peer-to-peer file sharing apps like ShareDrop use WebRTC Data Channels to transfer files directly between browsers without uploading to a server.

RFCs & Specs

RFC 8825 — Overview: Real-Time Protocols for Browser-Based ApplicationsRFC 5389 — Session Traversal Utilities for NAT (STUN)RFC 5766 — Traversal Using Relays around NAT (TURN)RFC 8445 — Interactive Connectivity Establishment (ICE)RFC 3264 — An Offer/Answer Model with SDP

WebRTC — Peer-to-Peer

WebRTCSTUNTURNICESDPSRTPDTLS

WebRTC negotiates a direct encrypted path between browsers using signaling + ICE, falling back to a TURN relay when peer-to-peer fails.

The Problem

Mental Model

Architecture Diagram

How It Works

The Connection Lifecycle

Every WebRTC connection goes through five phases:

Phase 1: Signaling (Application Responsibility)

WebRTC deliberately does not define how peers discover each other. A signaling mechanism — typically a WebSocket server — is needed to exchange two pieces of information:

SDP (Session Description Protocol): Describes what each peer can send/receive — codecs, media types, encryption fingerprints. Peer A creates an "offer," Peer B responds with an "answer."
ICE Candidates: Network addresses (IP:port pairs) where each peer can be reached.

// Peer A: Create offer
const peerConnection = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com', username: 'user', credential: 'pass' }
  ]
});

const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);

// Send offer to Peer B via the signaling server
signalingServer.send({ type: 'offer', sdp: offer });

Phase 2: ICE Candidate Gathering

Once the local description is set, the browser starts discovering its own network addresses:

Host candidates: Local IP addresses (192.168.1.x) — works when peers are on the same LAN.
Server-reflexive candidates: Public IP:port as seen by a STUN server — works when at least one side has a permissive NAT.
Relay candidates: Allocated address on a TURN server — always works, but adds latency and server cost.

peerConnection.onicecandidate = (event) => {
  if (event.candidate) {
    // Send each candidate to the remote peer via signaling
    signalingServer.send({ type: 'candidate', candidate: event.candidate });
  }
};

Phase 3: ICE Connectivity Checks

This is the magic of ICE: it tries every possible network path in parallel and picks the best one that actually works.

Phase 4: DTLS Handshake + SRTP Key Exchange

Phase 5: Media and Data Flow

NAT Traversal: The Hard Part

Without TURN (80% of cases):
Peer A ←——direct P2P——→ Peer B

With TURN (20% of cases):
Peer A ←——→ TURN Server ←——→ Peer B

This is why TURN infrastructure is critical for production WebRTC. Skipping it means 1 in 5 users cannot connect.

Scaling Beyond Two Peers

WebRTC's peer-to-peer model works beautifully for 1-to-1 calls. But group calls expose a fundamental scaling problem: full mesh topology.

Three architectures solve this:

Mesh (2-4 participants)

Every peer connects to every other peer. Simple, but upload bandwidth scales as O(n-1).

SFU — Selective Forwarding Unit (4-50 participants)

Peer A ──video──→ SFU ──→ Peer B (forwarded)
Peer B ──video──→ SFU ──→ Peer A (forwarded)
Peer C ──video──→ SFU ──→ Peer A, Peer B (forwarded)

The SFU does not transcode — it just forwards packets. This keeps CPU usage low while centralizing the routing decision.

MCU — Multipoint Control Unit (legacy)

The MCU receives all streams, mixes them into a single composite stream, and sends that back. This uses massive server CPU but minimizes client bandwidth. Rarely used in modern systems — SFUs won.

Data Channels: The Hidden Gem

Most engineers associate WebRTC with video calls, but RTCDataChannel might be the more interesting API. It provides a direct, encrypted, peer-to-peer data pipe between browsers.

const dataChannel = peerConnection.createDataChannel('files', {
  ordered: true,      // Guarantee order (like TCP)
  // ordered: false,  // No ordering (like UDP, lower latency)
});

dataChannel.onopen = () => {
  dataChannel.send('Hello from peer A!');
  // Or send binary data
  dataChannel.send(fileArrayBuffer);
};

dataChannel.onmessage = (event) => {
  console.log('Received:', event.data);
};

Data channels support both reliable (ordered, retransmitted) and unreliable (unordered, no retransmit) modes. This makes them viable for:

Peer-to-peer file transfer: ShareDrop, Snapdrop
Multiplayer gaming: Low-latency game state sync
Collaborative editing: Conflict-free replicated data types (CRDTs) synced directly between clients
Decentralized applications: No server needed for data exchange

Production Considerations

Key Points

•WebRTC establishes direct peer-to-peer connections between browsers, bypassing the server for media delivery — reducing latency and server bandwidth costs.
•Signaling is not part of the WebRTC spec. The application must provide its own mechanism (WebSocket, HTTP polling, even copy-pasting SDP) to exchange connection metadata.
•ICE tries multiple connection paths simultaneously: host candidates (local IP), server-reflexive (STUN-discovered public IP), and relay (TURN). It picks the best one that works.
•About 80% of WebRTC connections succeed peer-to-peer via STUN. The remaining 20% — behind symmetric NATs or restrictive firewalls — need a TURN relay server.
•WebRTC encrypts everything by default. DTLS secures the key exchange, SRTP encrypts media, and there is no option to disable encryption — it is mandatory in the spec.

Key Components

Component	Role
Signaling Server	Exchanges SDP offers/answers and ICE candidates between peers — not part of WebRTC spec, the transport is up to the application (WebSocket, HTTP, etc.)
ICE Framework	Interactive Connectivity Establishment — discovers the best network path between peers by testing STUN candidates and TURN relays
STUN Server	Tells a peer its public IP and port as seen from outside the NAT, enabling direct peer-to-peer connections
TURN Server	Media relay for when direct connection fails — routes all traffic through the server, used in ~20% of real-world connections
RTCPeerConnection	Core browser API that manages the peer connection lifecycle, ICE negotiation, codec selection, and media/data transport

When to Use

Tool Comparison

Tool	Type	Best For	Scale
Twilio	Managed	Production-ready video/voice APIs with global TURN infrastructure and recording	Small-Enterprise
LiveKit	Open Source	Open-source SFU with room-based video conferencing and well-maintained client SDKs	Medium-Enterprise
Janus	Open Source	Lightweight, plugin-based WebRTC gateway for custom media routing pipelines	Medium-Large
mediasoup	Open Source	Node.js-based SFU library for building custom video conferencing architectures	Medium-Enterprise

Debug Checklist

Check ICE connection state — if stuck at 'checking', the peers cannot find a path. Verify STUN/TURN server reachability.
Inspect SDP offer/answer in chrome://webrtc-internals — confirm codec negotiation succeeded and both peers agree on media formats.
Verify TURN server credentials and connectivity. Use trickle ICE test (webrtc.github.io/samples/src/content/peerconnection/trickle-ice) to validate.
Monitor chrome://webrtc-internals stats — check packets sent/received, round-trip time, and bandwidth estimation graphs.
Test on restrictive networks (corporate VPNs, hotel WiFi) where symmetric NATs block STUN. These environments require TURN.

Common Mistakes

Forgetting to deploy a TURN server. STUN alone fails for ~20% of users behind symmetric NATs. Without TURN, those users simply cannot connect.
Using a public TURN server in production. TURN relays significant bandwidth — this demands dedicated infrastructure or a paid service with capacity planning.
Assuming WebRTC scales like a regular server. Each peer-to-peer connection is point-to-point. A 10-person call requires 9 connections per peer (full mesh), which destroys bandwidth.
Not implementing a Selective Forwarding Unit (SFU) for group calls. Beyond 3-4 participants, full mesh is impractical — a media server is required.
Ignoring ICE restart. When a user switches from WiFi to cellular, the ICE candidates change. Without ICE restart, the call drops.

Real World Usage

•Google Meet uses WebRTC with an SFU architecture — each participant sends one stream to the server, which selectively forwards streams to other participants.
•Discord uses WebRTC for voice channels, with their own media server infrastructure handling thousands of concurrent voice connections per server.
•Zoom uses WebRTC for their browser client, though their desktop client uses a custom protocol for additional optimizations.
•Twilio built their entire real-time communication platform on WebRTC, providing APIs for voice, video, and data channels.
•Peer-to-peer file sharing apps like ShareDrop use WebRTC Data Channels to transfer files directly between browsers without uploading to a server.

The Problem

Mental Model

Architecture Diagram

How It Works

The Connection Lifecycle

NAT Traversal: The Hard Part

Scaling Beyond Two Peers

Mesh (2-4 participants)

SFU — Selective Forwarding Unit (4-50 participants)

MCU — Multipoint Control Unit (legacy)

Data Channels: The Hidden Gem

Production Considerations

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

The Connection Lifecycle

NAT Traversal: The Hard Part

Scaling Beyond Two Peers

Mesh (2-4 participants)

SFU — Selective Forwarding Unit (4-50 participants)

MCU — Multipoint Control Unit (legacy)

Data Channels: The Hidden Gem

Production Considerations

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics