Amazon SQS
AWS's managed message queue. No servers, no clusters, no headaches.
Use Cases
Architecture
Why It Exists
Anyone who has ever managed a RabbitMQ or ActiveMQ cluster in production knows the drill. Provision servers. Configure clustering. Handle failover. Watch disk space. Coordinate rolling upgrades during maintenance windows. It is real work, and for most teams on AWS, it is work that adds zero business value.
SQS has been around since 2006. It was one of the first AWS services, and that longevity says something: the core abstraction is solid. Creating a queue takes a single API call. Producers send messages. Consumers pull them off. AWS handles replication across multiple Availability Zones, scaling, patching, all of it. The underlying infrastructure is invisible.
That simplicity is the whole point. SQS scales from one message a day to millions per second without any configuration changes. There is no capacity planning. No broker sizing. It just works.
How It Works
Queue Types: SQS offers two queue types with very different trade-offs. Standard queues offer virtually unlimited throughput with at-least-once delivery and best-effort ordering. "Best-effort" is doing a lot of heavy lifting in that sentence. Messages can arrive more than once, and they can arrive out of order, especially under high throughput. Consumers need to be idempotent. No exceptions.
FIFO queues provide exactly-once processing and strict ordering within a message group, but at the cost of throughput limits: 300 TPS per API action, or 3,000 TPS with high-throughput mode and batching. For most workloads that need ordering, those limits are fine. But for serious volume with ordering requirements, Kafka is probably the better fit.
Message Lifecycle: A producer sends a message. SQS replicates it across multiple AZs. A consumer calls ReceiveMessage and gets the message back. At that point, SQS starts a visibility timeout, hiding the message from other consumers. The consumer does its work and calls DeleteMessage. Done.
If the consumer crashes or takes too long, the message pops back into the queue and another consumer picks it up. After a configurable number of failed attempts (maxReceiveCount), the message lands in a dead-letter queue where it can be inspected, debugged, and acted on.
Message Groups (FIFO): This is where FIFO queues get interesting. Each message gets assigned a MessageGroupId. Messages within the same group arrive strictly in order, one at a time. The next message in a group does not get delivered until the previous one is deleted. But messages in different groups process independently, in parallel. So ordering per-customer or per-order still allows parallelism across entities. It is a smart design.
Architecture Deep Dive
Internal Architecture: SQS replicates messages across multiple servers in multiple AZs before even acknowledging the send. The exact internals are proprietary, but AWS has published that SQS targets eleven 9s (99.999999999%) of durability and three 9s (99.9%) of availability. In practice, I have never lost a message in SQS. That does not mean it cannot happen, but the odds are vanishingly small.
Polling Model: SQS uses a pull-based model. Consumers poll the queue for messages. This is one of its quirks compared to push-based systems.
Short polling returns immediately and only queries a subset of SQS servers. This means an empty response can come back even when messages exist on servers that were not queried. It is wasteful and surprising the first time it happens.
Long polling (set WaitTimeSeconds between 1 and 20) queries all SQS servers and waits until a message shows up or the timeout expires. It eliminates almost all empty responses and costs less. Always use long polling in production. There is no good reason not to.
Lambda Integration: SQS and Lambda fit together naturally. Set up an event source mapping, and Lambda polls the queue automatically. It invokes the function with batches of messages, handles scaling based on queue depth, and returns failed messages to the queue automatically. For FIFO queues, Lambda limits concurrency per message group to preserve ordering.
This is the lowest-effort way to build a queue consumer on AWS. Write a function, point it at a queue, and walk away. For many workloads, that is all it takes.
Fan-Out Pattern (SNS + SQS): One of the most useful patterns in AWS. A producer publishes an event to an SNS topic. Multiple SQS queues subscribe to that topic, and each gets its own copy of the message. Each queue has its own consumer.
So a single "order placed" event can trigger payment processing, inventory updates, and notification sending all independently, without the producer knowing about any of those consumers. Add a new consumer by subscribing a new queue. Remove one by unsubscribing. The producer never changes.
Extended Client Library: The 256KB message size limit is a real constraint. The SQS Extended Client Library (available for Java, Python, and others) works around it by storing the actual message body in S3 and passing a reference pointer through SQS. On the consumer side, the library detects the pointer, downloads from S3, and hands over the full message. Supports payloads up to 2GB. It works well, though the system is now depending on S3 availability for every message.
Production Patterns
Backpressure Management: SQS absorbs traffic spikes naturally. Producers can send faster than consumers process, and the queue just grows. That is the whole point of a queue.
But it needs monitoring. Monitor ApproximateNumberOfMessagesVisible (queue depth) and ApproximateAgeOfOldestMessage to catch growing backlogs before they become a problem. Scale consumers based on queue depth using CloudWatch alarms and Auto Scaling, or let Lambda handle it automatically.
Exactly-Once Processing with Standard Queues: Standard queues can deliver duplicates. If exactly-once behavior is needed but FIFO throughput limits are too restrictive, build idempotency at the application level. Assign each message a unique deduplication ID. Track processed IDs in DynamoDB with conditional writes. When a duplicate arrives, the conditional write fails and the consumer skips it.
It is more code than using FIFO, but it works at much higher throughput. Pick the trade-off that fits the workload.
Cost Optimization: SQS charges per API request ($0.40 per million for standard, $0.50 for FIFO). Batch operations handle up to 10 messages per request and cut costs by up to 90%. Long polling kills empty receive costs. At 100 million messages/day with batching, the cost is roughly $120/month for standard or $150 for FIFO. That is cheap for a managed queue.
Real-World Scale: Netflix processes billions of messages per day through SQS to decouple their microservices. That says a lot about what SQS can handle at scale. For the vast majority of teams on AWS, SQS is the right default choice for messaging. It is not the most powerful option. It is not the most flexible. But it is the one that causes the least operational pain, and in most cases, that matters more than anything else.
Pros
- • Fully managed. No infrastructure to provision, patch, or scale.
- • Virtually unlimited throughput on standard queues, no pre-provisioning needed
- • At-least-once delivery with automatic retries and dead-letter queues
- • FIFO queues give you exactly-once processing and strict ordering
- • Deep AWS integration (Lambda, SNS, EventBridge, Step Functions)
Cons
- • Maximum message size is 256KB. Larger payloads need the S3 pointer pattern.
- • Standard queues deliver at-least-once, not exactly-once
- • FIFO queues cap at 300 messages/sec (3,000 with batching and high-throughput mode)
- • No message replay. Once you delete a message, it is gone.
- • Vendor lock-in to the AWS ecosystem
When to use
- • You are on AWS and need a simple, reliable message queue
- • You want zero operational overhead for messaging infra
- • You are decoupling services that do not need streaming semantics
- • Serverless architectures with Lambda-based consumers
When NOT to use
- • You need event streaming with replay and consumer groups (look at Kafka or Pulsar)
- • Multi-cloud or on-premise deployments where portability matters
- • High-throughput ordered streaming, because FIFO throughput limits will bite you
- • Complex routing patterns (use RabbitMQ or an SNS+SQS combo instead)
Key Points
- •Standard queues provide at-least-once delivery with best-effort ordering. Messages can arrive more than once and out of order, so consumers must be idempotent.
- •FIFO queues provide exactly-once processing and strict ordering within a message group. Each message group ID creates an independent ordered sequence, enabling different groups to be processed in parallel.
- •Visibility timeout controls how long a message hides from other consumers after one consumer receives it. Set it to 6x the average processing time. If the consumer crashes, the message reappears for another consumer to pick up.
- •Long polling (WaitTimeSeconds=20) cuts empty receives by 90%+ and saves money. Without it, consumers poll aggressively, get empty responses, and burn API calls at $0.40 per million.
- •The SQS Extended Client Library offloads messages larger than 256KB to S3 transparently. The queue stores a pointer to the S3 object, and the client library handles upload and download automatically.
Common Mistakes
- ✗Getting the visibility timeout wrong. Too short and multiple consumers process the same message at the same time. Too long and failed messages sit in limbo before they become available for retry. Measure the p99 processing time and set the timeout to 6x that value.
- ✗Skipping dead-letter queue setup. Without a DLQ, poison messages retry forever (up to maxReceiveCount), eating consumer resources. Always configure a DLQ with maxReceiveCount between 3 and 5, and set up monitoring on DLQ depth.
- ✗Using standard queues when ordering matters. Standard queues give best-effort ordering that breaks under load. If message order is critical, use FIFO queues with appropriate message group IDs.
- ✗Not batching send/receive operations. Sending messages one at a time costs 10x more than batching. Use SendMessageBatch and ReceiveMessage with MaxNumberOfMessages=10 to cut API calls by 90%.
- ✗Deleting messages before processing finishes. If the message is deleted and then the consumer crashes during post-processing, that work is lost. Only delete after all processing, including downstream writes, is confirmed.