Routing & BGP Basics
BGP is the postal routing system of the internet — 70,000+ networks announcing which addresses they can reach, selecting paths based on policy, not just distance.
The Problem
The internet is not a single network — it is 70,000+ independently operated networks (Autonomous Systems) that agree to exchange traffic. BGP is the protocol they use to tell each other which IP prefixes they can reach. When BGP fails, chunks of the internet go dark.
Mental Model
Like GPS navigation for data. A packet needs to get from Mumbai to Virginia. There are many possible paths through different networks. BGP is the system where each network says 'I know how to reach these addresses, and the path goes through these networks.' Routers pick the best path based on the options available — shortest AS path, business preferences, and policy rules.
Architecture Diagram
How It Works
The internet is a network of networks. There are over 70,000 Autonomous Systems (ASes) — each one independently operated by an ISP, cloud provider, enterprise, or content company. BGP (Border Gateway Protocol) is how these networks tell each other which IP prefixes they can reach.
The Basics
Each AS has an AS Number (ASN) — a unique identifier. Some well-known ones:
| ASN | Organization |
|---|---|
| AS15169 | |
| AS13335 | Cloudflare |
| AS16509 | Amazon (AWS) |
| AS32934 | Meta (Facebook) |
| AS8075 | Microsoft |
| AS3356 | Lumen (Level 3) — major transit provider |
BGP routers establish peering sessions over TCP port 179. Once connected, they exchange route announcements. An announcement says: "I can reach prefix 203.0.113.0/24, and the AS path to get there is [AS15169, AS3356]."
BGP Path Selection
When a router learns multiple paths to the same prefix, it picks the best one using a deterministic priority chain:
- Highest Local Preference — Operator-configured preference (iBGP only). "I prefer to send traffic through this peer."
- Shortest AS Path — Fewer hops through fewer networks.
[AS3356]beats[AS3356, AS174, AS7018]. - Lowest Origin Type — IGP > EGP > Incomplete. Rarely a tiebreaker in practice.
- Lowest MED — Multi-Exit Discriminator. A hint from the neighboring AS about which entry point to use.
- eBGP over iBGP — Prefer routes learned from external peers over internal ones.
- Lowest IGP cost to next hop — Use the closest exit point (hot-potato routing).
- Lowest Router ID — Final tiebreaker.
In practice, steps 1 and 2 decide almost everything. Operators set local preference to implement business policy (prefer paid transit over free peering, or vice versa), and AS path length handles most of the rest.
# View BGP routes on a FRRouting router
vtysh -c "show bgp ipv4 unicast 203.0.113.0/24"
# BGP routing table entry for 203.0.113.0/24
# Paths: (2 available, best #1)
# AS Path: 3356 15169, from 198.51.100.1 (peer-1)
# Local Pref: 200, Weight: 0, Valid, External, Best
# AS Path: 174 3356 15169, from 198.51.100.2 (peer-2)
# Local Pref: 100, Weight: 0, Valid, External
# The first path wins: higher Local Pref (200 > 100) and shorter AS path (2 vs 3)
eBGP vs iBGP
eBGP (External BGP) runs between different ASes. This is the internet routing protocol — the one that connects ISPs, cloud providers, and enterprises.
iBGP (Internal BGP) runs within a single AS, distributing externally learned routes to internal routers. iBGP has a key limitation: routes learned via iBGP are not re-advertised to other iBGP peers (to prevent loops). This means the network either needs a full mesh of iBGP sessions (O(n^2) — impractical for large networks) or route reflectors that centralize route distribution.
Interior Gateway Protocols: OSPF and IS-IS
BGP handles inter-AS routing. For routing within an AS, use an IGP:
| Protocol | Type | Convergence | Scale | Used By |
|---|---|---|---|---|
| OSPF | Link-state | Fast (sub-second with BFD) | Hundreds of routers | Enterprises, small ISPs |
| IS-IS | Link-state | Fast (sub-second with BFD) | Thousands of routers | Large ISPs, hyperscalers |
| RIP | Distance-vector | Slow (30s updates) | Tiny networks | Nobody in production anymore |
Google, Meta, and most hyperscalers use IS-IS internally because it scales better in very large flat topologies and carries both IPv4 and IPv6 natively.
Real-World BGP Incidents
BGP's power is also its weakness — a single misconfiguration can have global consequences.
Facebook Outage (October 4, 2021)
During routine maintenance, a command intended to assess backbone capacity instead withdrew all BGP routes for Facebook's prefixes. Every Facebook AS announcement disappeared from the global routing table. The result:
- Facebook, Instagram, WhatsApp, and Oculus went completely offline
- DNS servers for
facebook.combecame unreachable (they were inside the withdrawn prefix range) - Engineers could not remotely fix the problem because their own management tools were within the affected network
- Physical access to data centers was needed — door badge systems ran on the same network
- Total downtime: 6 hours. Estimated revenue loss: $60M+
The fix was sending engineers physically to data centers to restore BGP sessions from the console.
Pakistan YouTube Hijack (February 24, 2008)
Pakistan Telecom was ordered by the government to block YouTube. An engineer announced a more-specific route (208.65.153.0/24) for YouTube's prefix, intending to blackhole it domestically. But the announcement leaked to upstream providers via BGP and propagated globally. For about 2 hours, most of the world's YouTube traffic was routed to Pakistan Telecom and dropped.
This incident led to wider adoption of RPKI (Resource Public Key Infrastructure) — a system where prefix owners cryptographically sign which ASes are authorized to announce their IP ranges.
Lessons from These Incidents
- Prefix filtering is non-negotiable. Every BGP session should have inbound and outbound prefix filters.
- RPKI/ROA validation prevents hijacks. If YouTube had ROA records in 2008, validating ISPs would have rejected Pakistan Telecom's announcement.
- BGP changes should go through CI/CD. Treat router configs like code — peer review, automated validation, staged rollout.
- Out-of-band management is critical. If the management network goes down with BGP, there is no way to fix BGP remotely.
Production Considerations
BGP Session Security
# Always use MD5 authentication on BGP sessions
router bgp 65001
neighbor 198.51.100.1 password S3cureP@ssw0rd
# Set maximum prefix limits to prevent route leaks from crashing the router
neighbor 198.51.100.1 maximum-prefix 10000 warning-only
# Use prefix lists to filter accepted and announced prefixes
neighbor 198.51.100.1 prefix-list ALLOW-IN in
neighbor 198.51.100.1 prefix-list ALLOW-OUT out
Monitoring BGP Health
The key metrics to track:
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| BGP session state | Established | Idle/Active (flapping) | Down for >1min |
| Received prefixes | Stable count | >10% change in 5min | >50% drop |
| AS path length to key prefixes | 2-4 hops | >6 hops (suboptimal) | Unreachable |
| BGP update rate | <100/min steady state | >1000/min | >10,000/min (instability) |
| RPKI invalid routes | 0 | Any (review immediately) | Accepting invalids |
# Check BGP session health
vtysh -c "show bgp summary"
# Look at route changes over time
vtysh -c "show bgp ipv4 unicast statistics"
# Verify the announced prefixes are visible globally
# Use public looking glass: https://bgp.he.net/
# Or the RIPE RIS API:
curl -s "https://stat.ripe.net/data/looking-glass/data.json?resource=AS65001"
Anycast — BGP's Superpower for CDNs
Anycast uses BGP to announce the same IP prefix from multiple locations. When a user sends a packet to an Anycast IP, BGP routing delivers it to the nearest announcing location.
Cloudflare announces 1.1.1.1 from 300+ cities. Google announces 8.8.8.8 from dozens of PoPs. Every DNS root server uses Anycast. This is how sub-10ms DNS resolution works globally — the query goes to whichever server is closest in terms of BGP path, not physical distance (though they are usually correlated).
The tradeoff is that Anycast works best for stateless or connection-light protocols like DNS (UDP). For TCP-based services, a BGP route change can shift traffic to a different server mid-connection, breaking the TCP stream. CDNs handle this with flow-aware load balancing and connection migration techniques.
Key Points
- •BGP is the protocol that glues the entire internet together. Every ISP, cloud provider, and CDN uses it.
- •BGP selects routes based on a priority chain: local preference → AS path length → origin type → MED → eBGP over iBGP → lowest router ID.
- •A BGP misconfiguration can take down portions of the internet. Facebook's October 2021 outage was caused by a bad BGP withdrawal.
- •Interior Gateway Protocols (OSPF, IS-IS) handle routing within an AS. BGP handles routing between ASes.
- •BGP is a policy-based protocol. Unlike IGPs that find the shortest path, BGP lets operators express business relationships through routing policy.
Key Components
| Component | Role |
|---|---|
| Autonomous System (AS) | A network or group of networks under a single administrative domain, identified by an AS number (ASN) |
| BGP Peering Session | A TCP connection between two BGP routers that exchange routing information — prefix announcements and withdrawals |
| Routing Table (RIB) | The database of all known routes. BGP selects the best path from multiple options based on attributes. |
| Prefix Announcement | An AS tells its neighbors: 'I can reach 203.0.113.0/24 — send traffic for that range to me' |
| AS Path | The ordered list of ASNs a route has traversed. Used for loop detection and path selection — shorter is generally preferred. |
When to Use
BGP comes into play when running an AS (hosting companies, enterprises with their own IP space), connecting to cloud providers via Direct Connect, using Anycast for global load balancing, or debugging why traffic to a specific destination takes a weird path.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| BIRD | Open Source | Full-featured BGP daemon used by major IXPs and hosting companies | Large-Enterprise |
| FRRouting (FRR) | Open Source | Multi-protocol routing suite (BGP, OSPF, IS-IS) for Linux — successor to Quagga | Medium-Enterprise |
| AWS Direct Connect | Managed | Private BGP peering with AWS over dedicated fiber, bypassing the public internet | Large-Enterprise |
| Cloudflare Magic Transit | Managed | BGP-based DDoS protection — announce prefixes through Cloudflare's network | Large-Enterprise |
Debug Checklist
- Check BGP session state: 'show bgp summary' on the router or 'birdc show protocols' on BIRD. Look for Established state.
- Verify prefix visibility: Use bgp.he.net or RIPE RIS to see if the prefix is visible globally and which AS paths are in use.
- Check for route leaks: Look at the AS path for unexpected ASNs. A normally 3-hop path that suddenly shows 8 hops is suspicious.
- Validate RPKI: Use rpki-validator or Cloudflare's RPKI dashboard to check if the ROAs are valid and covering the announced prefixes.
- Monitor convergence: Track BGP update/withdrawal counts. A burst of updates indicates instability — route flapping or a misconfiguration.
Common Mistakes
- Announcing prefixes the AS does not own. Without RPKI validation, anyone can claim any IP prefix — this is a BGP hijack.
- Not implementing maximum prefix limits on BGP sessions. A peer leaking a full table (900K+ routes) can overflow the router's memory.
- Ignoring BGP convergence time. After a failure, BGP can take 30-90 seconds to converge — an eternity for real-time traffic.
- Using BGP for internal routing when OSPF or IS-IS would be simpler and converge faster. BGP inside an AS adds unnecessary complexity.
- Not deploying RPKI/ROA to validate route origins. This is the single most impactful action for preventing route hijacking.
Real World Usage
- •Cloudflare uses BGP Anycast to announce the same IP prefix from 300+ cities. The nearest BGP router wins, giving users the closest edge server.
- •AWS Direct Connect establishes BGP sessions between the on-prem router and AWS, providing private connectivity with predictable latency.
- •Google operates AS15169, one of the most connected ASes in the world, peering at nearly every major internet exchange point.
- •Facebook's October 2021 outage: a maintenance command withdrew all BGP routes, making Facebook's DNS servers unreachable for 6+ hours.
- •In 2008, Pakistan Telecom hijacked YouTube's /24 prefix via BGP to block YouTube domestically, accidentally taking YouTube offline globally for 2 hours.