MQTT & IoT Protocols
MQTT is a featherweight pub/sub protocol with three QoS levels, retained messages for current state, and Last Will for disconnect detection — built for billions of constrained devices.
The Problem
Billions of IoT devices need to communicate with cloud services, but they operate under severe constraints — limited CPU, memory, battery, and bandwidth. HTTP is too heavy for a sensor that sends 50-byte readings every 10 seconds over a cellular connection. IoT needs a protocol designed for minimal overhead, unreliable networks, and massive fan-out.
Mental Model
Like a bulletin board system — devices post messages to labeled sections (topics), and anyone signed up for that section gets the update. Subscribing after a message was posted still shows the latest pinned notice (retained message). And if someone disappears without saying goodbye, the board automatically posts a 'gone missing' notice (LWT).
Architecture Diagram
How It Works
MQTT (Message Queuing Telemetry Transport) was designed in 1999 by Andy Stanford-Clark (IBM) and Arlen Nipper for connecting oil pipeline sensors over unreliable satellite links. That origin story reveals everything about its design philosophy: minimal overhead, unreliable networks, and devices that cannot afford to waste a single byte.
The protocol is built on three concepts: topics, publish/subscribe, and quality of service levels.
The Pub/Sub Model
Unlike HTTP's request/response pattern, MQTT uses publish/subscribe. Devices never talk directly to each other. Every message goes through a central broker, which routes messages based on topic subscriptions.
Publisher (Sensor) → MQTT Broker → Subscriber (Dashboard)
→ Subscriber (Alert System)
→ Subscriber (Database Logger)
This decoupling is powerful. The sensor does not know or care who is consuming its data. New subscribers — analytics pipelines, alerting systems, data lakes — can be added without changing anything on the device side.
Topic Hierarchy and Wildcards
Topics are hierarchical strings delimited by forward slashes:
building/floor3/room301/temperature
building/floor3/room301/humidity
building/floor3/room302/temperature
fleet/truck-42/gps/location
fleet/truck-42/engine/rpm
Two wildcard characters make subscriptions flexible:
+(single level):building/floor3/+/temperaturematches all rooms on floor 3.#(multi level):building/floor3/#matches everything on floor 3 — all rooms, all sensor types.
import paho.mqtt.client as mqtt
client = mqtt.Client()
client.connect("broker.example.com", 1883)
# Subscribe to all temperature sensors on floor 3
client.subscribe("building/floor3/+/temperature", qos=1)
# Subscribe to everything from truck 42
client.subscribe("fleet/truck-42/#", qos=0)
def on_message(client, userdata, msg):
print(f"{msg.topic}: {msg.payload.decode()}")
client.on_message = on_message
client.loop_forever()
QoS Levels: The Core Trade-Off
MQTT's three QoS levels are its most important design decision. Each level trades reliability for overhead:
QoS 0 — At Most Once (Fire and Forget)
The publisher sends the message. The broker does not acknowledge. If the network drops the packet, the message is lost forever. This sounds dangerous, but it is perfect for high-frequency telemetry where missing one reading out of a thousand does not matter.
Overhead: 1 packet.
QoS 1 — At Least Once
The publisher sends the message and waits for a PUBACK from the broker. If no PUBACK arrives within a timeout, it resends. This guarantees delivery but may cause duplicates — the broker might have received the first message and the PUBACK got lost.
Overhead: 2 packets (PUBLISH + PUBACK).
QoS 2 — Exactly Once
A four-packet handshake ensures no duplication and no loss:
- Publisher → Broker: PUBLISH
- Broker → Publisher: PUBREC (received)
- Publisher → Broker: PUBREL (release)
- Broker → Publisher: PUBCOMP (complete)
This is expensive but essential for operations like financial transactions or actuator commands where duplicates cause real problems (opening a valve twice).
Overhead: 4 packets.
# QoS 0: Temperature reading every second — loss is acceptable
client.publish("sensors/temp", "22.5", qos=0)
# QoS 1: Alert that needs guaranteed delivery — duplicates are okay
client.publish("alerts/high-temp", "WARNING: 85°C", qos=1)
# QoS 2: Command to open a valve — must execute exactly once
client.publish("commands/valve-3/open", "1", qos=2)
Retained Messages and Last Will
These two features solve the most common IoT pain points: late joiners and crash detection.
Retained Messages
When a message is published with the retain flag set, the broker stores the last message for that topic. Any future subscriber immediately receives this stored message upon subscribing — no need to wait for the next publish cycle.
# Publish current state as retained
client.publish("device/thermostat/status", '{"temp": 22, "mode": "heating"}',
qos=1, retain=True)
# A subscriber connecting 2 hours later immediately receives this message
Without retained messages, a new dashboard loading at 3 AM would show blank values until each sensor publishes again. With retained messages, it immediately shows the last known state.
Last Will and Testament (LWT)
When a client connects to the broker, it can register a "last will" message. If the client disconnects ungracefully (network loss, crash, power failure), the broker automatically publishes this message on the client's behalf.
client = mqtt.Client()
# Register LWT before connecting
client.will_set(
topic="devices/sensor-42/status",
payload="offline",
qos=1,
retain=True # Combine with retain so the offline state persists
)
client.connect("broker.example.com", 1883)
# Publish online status after connecting
client.publish("devices/sensor-42/status", "online", qos=1, retain=True)
Now the monitoring system can subscribe to devices/+/status and always know which devices are online. If sensor-42 loses power, the broker publishes "offline" automatically. Combining LWT with retained messages means the offline status persists for any future subscriber.
Why Not Just Use HTTP?
This is the question every web developer asks. Here is the concrete comparison:
| Aspect | HTTP | MQTT |
|---|---|---|
| Minimum overhead | ~300 bytes (headers) | 2 bytes |
| Connection model | Short-lived (or keep-alive) | Persistent |
| Direction | Client-initiated request/response | Bidirectional publish/subscribe |
| Push to device | Requires polling or SSE | Native (subscribe) |
| Offline handling | None | QoS 1/2 + clean_session=false queues messages |
| Battery impact | High (TLS handshake per request) | Low (one connection, tiny packets) |
For a sensor sending a 50-byte temperature reading every 10 seconds over a cellular connection, HTTP would add 300+ bytes of headers per request, require a TLS handshake per connection (or keep-alive management), and provide no way to push commands back to the device without polling.
MQTT sends that same reading in 52 bytes (2-byte fixed header + topic + payload) over a single persistent connection that also receives commands via subscriptions. Over 24 hours, that sensor sends 8,640 readings — the bandwidth savings add up to meaningful battery life on constrained devices.
MQTT 5.0: What Changed
MQTT 5.0 (2019) added features the IoT community had been requesting for years:
- Reason codes: Every acknowledgment now includes a reason code (success, quota exceeded, topic alias invalid, etc.), making debugging dramatically easier.
- Shared subscriptions: Multiple subscribers on
$share/group/topicget round-robin delivery — built-in load balancing without a custom solution. - Topic aliases: Replace long topic strings with short integer aliases after the first message, reducing per-message overhead.
- Message expiry: Set a TTL on messages. Stale retained messages auto-delete instead of lingering forever.
- Request/response pattern: Optional correlation data and response topic headers enable request/response over MQTT without application-level workarounds.
- User properties: Key-value metadata on messages — think HTTP headers for MQTT.
CoAP: The HTTP-Like Alternative
For devices that need request/response semantics (REST-style APIs for IoT), CoAP (Constrained Application Protocol, RFC 7252) is the alternative. It runs over UDP, uses compact binary headers, and maps cleanly to HTTP methods (GET, PUT, POST, DELETE).
CoAP is better for resource-oriented interactions (read sensor value, configure device settings). MQTT is better for event-driven pub/sub (sensor pushes updates, multiple systems consume). Many IoT architectures use both — CoAP for device management and configuration, MQTT for telemetry and events.
The choice between MQTT and CoAP often comes down to: does the system push data (MQTT) or does something poll for data (CoAP)? For most IoT telemetry, the answer is push — which is why MQTT dominates.
Key Points
- •MQTT uses a publish/subscribe model where devices never communicate directly — the broker handles all routing, decoupling producers from consumers.
- •The three QoS levels enable trading reliability for efficiency: QoS 0 for telemetry that can tolerate loss, QoS 2 for commands that must arrive exactly once.
- •MQTT's overhead is tiny — a minimal packet is just 2 bytes. HTTP's minimum overhead is hundreds of bytes. This matters at scale with 10,000 battery-powered sensors.
- •Retained messages solve the 'late joiner' problem — a new subscriber immediately gets the current state without waiting for the next publish cycle.
- •Last Will and Testament (LWT) provides automatic offline detection. If a device loses connectivity, the broker publishes its pre-configured 'death' message.
Key Components
| Component | Role |
|---|---|
| MQTT Broker | Central message router that receives published messages and distributes them to all subscribers matching the topic filter |
| Topic Hierarchy | Slash-delimited namespace (home/kitchen/temperature) with wildcard subscriptions using + (single level) and # (multi level) |
| QoS Levels | Three delivery guarantees — QoS 0 (at most once, fire-and-forget), QoS 1 (at least once, may duplicate), QoS 2 (exactly once, four-packet handshake) |
| Retained Messages | Last message on a topic is stored by the broker and immediately delivered to any new subscriber, providing current state without waiting |
| Last Will and Testament (LWT) | A message pre-registered with the broker that is automatically published if the client disconnects ungracefully, signaling device offline status |
When to Use
Use MQTT when connecting IoT devices, sensors, or any constrained client that needs lightweight pub/sub messaging over unreliable networks. It fits well when flexible delivery guarantees (QoS 0-2), offline message queuing, and clean disconnect detection are required. Do not use it for request/response APIs (use HTTP/CoAP), large file transfers, or browser-to-browser communication.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Mosquitto | Open Source | Lightweight single-node MQTT broker, ideal for development and small deployments | Small-Medium |
| HiveMQ | Commercial | Enterprise MQTT broker with clustering, monitoring dashboard, and Kafka bridge | Medium-Enterprise |
| EMQX | Open Source | High-performance distributed MQTT broker handling millions of concurrent connections | Large-Enterprise |
| AWS IoT Core | Managed | Fully managed MQTT broker integrated with AWS services (Lambda, DynamoDB, S3) | Medium-Enterprise |
Debug Checklist
- Check broker connectivity with mosquitto_sub -h broker -t '#' -v to see all messages flowing through the broker.
- Verify topic subscriptions match publish topics exactly — MQTT topics are case-sensitive and 'Room1/temp' is not 'room1/temp'.
- Inspect QoS level mismatches — a publisher at QoS 0 and subscriber at QoS 2 still delivers at QoS 0. The minimum of the two QoS levels applies.
- Check retained message state — publish an empty retained message to clear stale state: mosquitto_pub -t topic -r -n.
- Monitor broker health — check connection count, message rate, subscription count, and memory usage in the broker's $SYS/# system topics.
Common Mistakes
- Using QoS 2 for everything. The exactly-once four-packet handshake (PUBLISH → PUBREC → PUBREL → PUBCOMP) is expensive. Use QoS 0 for telemetry and QoS 1 for commands.
- Designing flat topic structures like device123-temperature. Use hierarchical topics (building/floor3/room301/temperature) to enable wildcard subscriptions.
- Publishing large payloads over MQTT. It supports up to 256MB messages, but it was designed for small sensor readings. For large data, use MQTT to signal availability and HTTP to download.
- Ignoring clean session semantics. With clean_session=false, the broker queues messages for offline clients. Thousands of offline devices with QoS 1 subscriptions can exhaust broker memory.
- Not setting up LWT messages. Without them, the system has no way to distinguish a device that has nothing to report from a device that has crashed.
Real World Usage
- •AWS IoT Core uses MQTT as its primary protocol, handling billions of messages daily from industrial sensors, smart homes, and fleet management devices.
- •Azure IoT Hub supports MQTT for device-to-cloud telemetry and cloud-to-device commands, with per-device authentication and message routing rules.
- •Facebook Messenger originally used MQTT for mobile push notifications because of its minimal battery and bandwidth overhead on mobile networks.
- •Tesla vehicles use MQTT for telemetry reporting — battery status, location, diagnostics — and receiving over-the-air update notifications.
- •Smart home platforms like Home Assistant use MQTT as their universal device integration protocol, bridging hundreds of different device types.