ARP & MAC Addresses
ARP broadcasts 'who has this IP?' on the local network, gets a MAC address reply, and caches it — simple, essential, and completely insecure by design.
The Problem
IP addresses are logical — they are assigned by software and can change. But Ethernet frames must be addressed to a specific hardware (MAC) address to be delivered on the wire. ARP bridges this gap, translating between the two address types. Without it, IP networking on Ethernet simply does not work.
Mental Model
Like shouting in a crowded room: 'Who has phone number 555-1234?' Everyone hears the shout (broadcast), but only the person with that number raises their hand and says 'That is me, here is my face so I can be found directly next time' (unicast reply). The face gets remembered (cache) so there is no need to shout again.
Architecture Diagram
How It Works
When a machine wants to send a packet to 192.168.1.20 on the same subnet, it has a problem: Ethernet frames are addressed by MAC address, not IP address. The machine knows the destination IP but not the destination MAC. ARP (Address Resolution Protocol) solves this with a simple two-step process.
The ARP Process
Step 1: ARP Request (Broadcast)
The machine sends an Ethernet frame with destination FF:FF:FF:FF:FF:FF — the broadcast address. Every device on the local network segment receives this frame. The payload says: "Who has 192.168.1.20? Tell 192.168.1.10 at AA:AA:AA:AA:AA:AA."
Step 2: ARP Reply (Unicast)
The device at 192.168.1.20 recognizes its own IP and sends a reply directly back (unicast) to AA:AA:AA:AA:AA:AA: "I am 192.168.1.20 and my MAC address is BB:BB:BB:BB:BB:BB."
Step 3: Cache and Send
The machine stores this mapping in its ARP cache and sends the original data frame addressed to BB:BB:BB:BB:BB:BB. Future packets to the same IP skip the ARP process entirely until the cache entry expires.
# View the ARP cache
arp -a
# ? (192.168.1.1) at dc:a6:32:12:34:56 [ether] on eth0
# ? (192.168.1.20) at aa:bb:cc:dd:ee:ff [ether] on eth0
# Same thing with the modern 'ip' command
ip neigh show
# 192.168.1.1 dev eth0 lladdr dc:a6:32:12:34:56 REACHABLE
# 192.168.1.20 dev eth0 lladdr aa:bb:cc:dd:ee:ff STALE
# Send an ARP request manually
arping -c 3 192.168.1.20
# ARPING 192.168.1.20 from 192.168.1.10 eth0
# Unicast reply from 192.168.1.20 [AA:BB:CC:DD:EE:FF] 0.823ms
What Happens Across Subnets
ARP only works within a single broadcast domain. When the destination IP is on a different subnet, the process is different:
- The machine checks the destination against its subnet mask
- Destination is outside the local subnet, so it ARPs for the default gateway's MAC instead
- The gateway (router) receives the frame, strips the Ethernet header, and routes the IP packet
- At the destination subnet, the router ARPs for the final destination's MAC and delivers the frame
This means the Ethernet header changes at every hop, but the IP header stays the same throughout the journey. The MAC addresses are hop-by-hop; the IP addresses are end-to-end.
ARP Cache Management
The ARP cache is not permanent. Entries go through states:
| State | Meaning | Linux Default TTL |
|---|---|---|
| REACHABLE | Recently confirmed, actively used | 30 seconds |
| STALE | Expired but kept for potential reuse | Until garbage collected |
| DELAY | Being re-probed | 5 seconds before probe |
| PROBE | Actively sending ARP requests to reconfirm | 3 retries |
| FAILED | No ARP reply received after probing | Removed |
# Tune ARP cache behavior on Linux
# How long an entry stays REACHABLE
sudo sysctl -w net.ipv4.neigh.eth0.base_reachable_time_ms=30000
# ARP cache garbage collection threshold (number of entries)
sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=1024 # Start GC
sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=2048 # Aggressive GC
sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=4096 # Hard limit
In large environments (thousands of containers or VMs on a flat network), ARP cache pressure becomes a real problem. The kernel GC threshold defaults are often too low, causing Neighbour table overflow errors. Kubernetes clusters frequently need these tuned upward.
ARP Security — Why It Matters
ARP has no authentication whatsoever. Any device on the network can send an ARP reply claiming any IP-to-MAC mapping. This makes ARP spoofing trivially easy and genuinely dangerous.
ARP Spoofing Attack
An attacker sends fake ARP replies to the victim and the gateway:
- Attacker tells victim: "192.168.1.1 (gateway) is at
ATTACKER-MAC" - Attacker tells gateway: "192.168.1.10 (victim) is at
ATTACKER-MAC" - Both the victim and gateway update their ARP cache with the attacker's MAC
- All traffic between victim and gateway now flows through the attacker (man-in-the-middle)
# Detect ARP spoofing: look for the gateway MAC changing
watch -n 1 "arp -a | grep 'gateway-ip'"
# Or use arpwatch to monitor and alert on MAC changes
sudo arpwatch -i eth0 -d
# Capture suspicious ARP traffic
sudo tcpdump -i eth0 arp -n -e
# Look for: different source MACs claiming the same IP
Defenses
| Defense | Level | How It Works |
|---|---|---|
| Dynamic ARP Inspection (DAI) | Switch | Validates ARP packets against DHCP snooping database. Only allows ARP replies that match known IP-MAC bindings. |
| Static ARP entries | Host | Manually set arp -s gateway-ip gateway-mac. Entries cannot be overwritten by ARP replies. Does not scale well. |
| 802.1X Port Authentication | Switch | Authenticates devices before they can send any traffic, including ARP. |
| VLAN segmentation | Switch | Limits ARP broadcast scope. Attackers can only spoof within their VLAN. |
| ArpON | Host | ARP handler daemon that detects and blocks spoofing attempts on individual hosts. |
In cloud environments, this is less of a concern because the hypervisor handles ARP. AWS uses proxy ARP — the hypervisor responds to ARP requests on behalf of instances, and instances never see each other's real ARP traffic.
Real-World Impact
Gratuitous ARP and High Availability
Gratuitous ARP is an ARP reply that nobody asked for. A device sends it to announce: "Hey everyone, IP 10.0.0.100 is now at my MAC address." This is critical for failover:
When a primary server fails and the backup takes over the virtual IP (VIP), the backup sends a gratuitous ARP. Every device on the LAN (and every switch) updates their ARP/MAC table to point the VIP to the new server's MAC. Without gratuitous ARP, the failover would take minutes (until ARP caches expire). With it, the failover completes in under a second.
# Send a gratuitous ARP (useful for testing failover)
arping -A -c 3 -I eth0 10.0.0.100
# Keepalived does this automatically during VRRP failover
# In keepalived.conf:
# vrrp_instance VI_1 {
# state BACKUP
# interface eth0
# virtual_ipaddress {
# 10.0.0.100
# }
# garp_master_refresh 5 # Re-send gratuitous ARP every 5 seconds
# }
ARP in Kubernetes
Container networking heavily uses ARP. On a single node with the default bridge CNI:
- Each pod gets a
vethpair — one end in the pod, one end on the bridge - When pod A sends to pod B (same node), the bridge resolves pod B's IP to its veth MAC via ARP
- The bridge forwards the frame to pod B's veth interface
Cross-node pod communication typically uses VXLAN or other overlay networks that encapsulate the original Ethernet frame (including ARP) inside a UDP packet. The ARP request from pod A gets tunneled to the destination node, where it is decoded and broadcast on the local bridge.
ARP Table Overflow in Large Clusters
A common production issue in large Kubernetes clusters:
kernel: [1234567.89] Neighbour table overflow.
This happens when the ARP cache exceeds the kernel's garbage collection thresholds. With 500+ pods per node (not unusual in dense deployments), the default gc_thresh3 of 1024 is too low.
# Fix: Increase ARP cache limits
sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=4096
sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=8192
sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=16384
# Make persistent in /etc/sysctl.d/99-arp.conf
This is one of those issues that works fine in dev (10 pods), works fine in staging (100 pods), and explodes in production (1000 pods). Always tune ARP cache limits as part of the node bootstrap process.
Key Points
- •ARP operates at Layer 2 and bridges the gap between IP addresses (Layer 3) and MAC addresses (Layer 2).
- •ARP requests are broadcast to every device on the LAN segment. In large flat networks, ARP traffic can become a serious problem.
- •ARP cache entries expire (typically 60-300 seconds on Linux, 120 seconds on most switches) and must be refreshed.
- •ARP has zero built-in authentication. Any device can claim any IP-to-MAC mapping — this is the basis of ARP spoofing attacks.
- •In cloud environments, ARP is handled differently — AWS uses proxy ARP, and most CNI plugins manage ARP for container networking.
Key Components
| Component | Role |
|---|---|
| MAC Address | A 48-bit hardware address (e.g., AA:BB:CC:DD:EE:FF) burned into every network interface card — unique per device |
| ARP Request (Broadcast) | A broadcast frame asking 'Who has IP 192.168.1.1? Tell 192.168.1.10' — every device on the segment hears it |
| ARP Reply (Unicast) | The target responds with its MAC address directly to the requester — only the requester receives this |
| ARP Cache | A local table mapping IP→MAC that each device maintains to avoid broadcasting on every packet |
| Gratuitous ARP | An unsolicited ARP reply a device sends to announce its presence or update mappings — used in failover scenarios |
When to Use
ARP happens automatically — it is not invoked directly. But understanding it matters when debugging Layer 2 issues, configuring HA failover with virtual IPs, securing LANs against spoofing attacks, or troubleshooting connectivity on the local subnet.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| arpwatch | Open Source | Monitoring ARP activity and detecting new or changed MAC-to-IP mappings on a LAN | Small-Enterprise |
| Wireshark | Open Source | Capturing and analyzing ARP packets with full decode and filtering | Small-Enterprise |
| Dynamic ARP Inspection (DAI) | Commercial | Switch-level ARP validation using DHCP snooping database to prevent spoofing | Medium-Enterprise |
| arping | Open Source | Sending ARP requests from the command line to test Layer 2 reachability | Small-Enterprise |
Debug Checklist
- View the ARP cache: 'arp -a' or 'ip neigh show' to see current IP-to-MAC mappings.
- Check for stale entries: 'ip neigh show | grep STALE' — stale entries can cause intermittent connectivity.
- Flush a specific entry: 'ip neigh flush 192.168.1.1' to force a fresh ARP resolution.
- Watch ARP traffic live: 'sudo tcpdump -i eth0 arp -n' to see ARP requests and replies in real time.
- Detect ARP spoofing: look for multiple IPs mapping to the same MAC, or a MAC that keeps changing for a gateway IP.
Common Mistakes
- Ignoring ARP in troubleshooting. When ping fails to a host on the same subnet, the problem is often ARP, not routing.
- Allowing flat Layer 2 networks to grow too large. Thousands of hosts on one broadcast domain means ARP storms.
- Not using Dynamic ARP Inspection (DAI) on managed switches, leaving the network vulnerable to ARP spoofing.
- Assuming MAC addresses are always unique. Virtual machines, containers, and cloned images can have duplicate MACs.
- Forgetting that ARP only works within a broadcast domain. Across subnets, the router handles the MAC resolution on each segment.
Real World Usage
- •Kubernetes uses ARP for pod-to-pod communication on the same node — the bridge interface resolves pod IPs to veth MAC addresses.
- •Load balancers using DSR (Direct Server Return) rely on gratuitous ARP to claim a virtual IP without changing routing.
- •High-availability clusters (Keepalived, VRRP) send gratuitous ARP when the VIP moves to the backup node.
- •Data center switches maintain MAC address tables with thousands of entries, forwarding frames based on destination MAC.
- •AWS handles ARP differently — instances use proxy ARP through the hypervisor, so real broadcast ARP never appears in a VPC.