Messaging & StreamingTech 6 of 8

Messaging

Apache Pulsar

A messaging and streaming system built around separated compute and storage, with multi-tenancy and geo-replication baked in from day one

Use Cases

Running both queuing and streaming workloads on one platform instead of stitching Kafka and RabbitMQ togetherMulti-tenant messaging where different teams need real isolation, not just separate topic prefixesGeo-replicated event streaming across data centers without bolting on MirrorMakerEvent sourcing with long-term retention by offloading old data to S3 or GCSLightweight event processing via Pulsar Functions (skip the full Flink deployment for simple transforms)High-volume IoT ingestion where you need flexible subscription models

Architecture

Why It Exists

Kafka showed the world that a distributed log can unify stream processing and message queuing. That was a big deal. But Kafka's architecture ties compute and storage together. Brokers own partitions. They store the data on local disk. When a broker is added, partitions rebalance, and rebalancing means copying terabytes of data across the network. Multi-tenancy is an afterthought. Geo-replication requires MirrorMaker, which is a whole separate operational headache.

Yahoo built Pulsar in 2013 to solve these problems (it was open-sourced in 2016). The core insight: separate the serving layer from the storage layer. Brokers handle client connections and protocol logic. Apache BookKeeper handles durable storage. Because brokers hold no data, adding, removing, or replacing them requires moving zero bytes. That same separation makes multi-tenancy practical at the namespace level, and it makes tiered storage possible so old data moves to S3 transparently.

Is it better than Kafka? Depends on the requirements. For most teams running a single cluster with straightforward streaming, Kafka is simpler, more mature, and has a much bigger ecosystem. Pulsar earns its complexity when the requirements genuinely include multi-tenancy, geo-replication, or long-term retention without burning money on SSDs.

How It Works

Topic Model: Topics live in a hierarchy: tenant, then namespace, then topic. Each level carries its own configuration. Tenants get resource quotas. Namespaces get their own retention and replication policies. Topics get schema enforcement. This hierarchy is what makes multi-tenancy real instead of cosmetic. Different teams sharing a cluster actually get isolation, not just different topic prefixes.

Message Flow: A producer sends a message to a broker. The broker figures out which BookKeeper ledger segment is active for that partition, writes the message across the configured write quorum of bookies, and acknowledges the producer once the ack quorum responds. Consumers connect to the broker, which pulls messages from BookKeeper and delivers them based on the subscription type. The broker itself stores nothing.

Subscriptions: This is where Pulsar gets interesting. Four modes, each solving a different problem. Exclusive gives one consumer the entire subscription with strict ordering (like a Kafka consumer on a single partition). Failover is active-standby with automatic switchover. Shared distributes messages round-robin across multiple consumers for horizontal scaling, but ordering is lost. Key_Shared hashes messages by key so the same key always hits the same consumer. Picking the wrong mode causes real pain. Shared when ordering is needed is probably the most common mistake out there.

Architecture Deep Dive

Apache BookKeeper: BookKeeper is a distributed log storage system optimized for low-latency durable writes. Each bookie (BookKeeper node) splits its storage into two parts: the journal, a write-ahead log on a dedicated disk that fsyncs every write for durability, and ledger storage on a separate disk optimized for read throughput. When a broker writes a message, it sends the write to Qw bookies simultaneously. Each bookie writes to its journal, fsyncs, and acknowledges. The broker considers the write committed after Qa acknowledgments come back. In practice, this design hits sub-5ms write latency with strong durability. But only if journal and ledger sit on separate physical disks. Skip that, and latency spikes will follow.

Ledger Management: Topics are split into ledgers, and each ledger is an append-only log distributed across a set of bookies. When a ledger hits a size or time threshold, the broker closes it (making it immutable) and opens a new one. Closed ledgers can be offloaded to tiered storage. This is fundamentally different from Kafka, where a partition's data sits on one broker. In Pulsar, topic data is striped across the entire BookKeeper cluster, spreading I/O load more evenly. The tradeoff is an extra network hop for every read and write.

Stateless Brokers: Since brokers hold no data, they are genuinely stateless. ZooKeeper coordinates topic ownership, assigning each topic partition to exactly one broker at a time. When a broker dies, its topics get reassigned to surviving brokers within seconds. The new broker picks up right where the old one stopped, reading from BookKeeper. No data copying. No hour-long rebalances. This is the single biggest operational advantage over Kafka, and it is real.

Geo-Replication: Each cluster runs its own brokers and bookies. Replication cursors track which messages have been forwarded to which clusters. A message published in US-East gets asynchronously replicated to EU-West and APAC. Each cluster keeps independent consumer offsets, so consumers in each region work at their own pace. This was Yahoo's original motivation for building Pulsar: they needed to replicate messaging data across 10+ global data centers, and bolting geo-replication onto an existing system was not working.

Tiered Storage: Old ledger segments (past a configurable age or size) get offloaded to object storage automatically. A consumer reading old data triggers a fetch from S3 instead of BookKeeper. Latency goes up, but cost drops dramatically. This is the right tradeoff for rarely accessed data that needs to stick around for compliance or replay purposes.

Where It Pulls Ahead of Kafka

The architecture differences are interesting. The practical differences are what actually change a design.

Parallelism without partition math. Kafka partitions are both a unit of storage and a unit of parallelism. Want 200 concurrent consumers? Need 200 partitions. Want 500 next quarter? Repartition, rebalance, migrate data. Pulsar's shared subscriptions decouple the two. The broker hands out messages to whatever consumers are connected. Scaling workers is just scaling pods. The topic does not care.

Per-message acknowledgment. In Kafka, committing offset N means every message before N is also considered done. One slow message blocks progress for everything behind it. Pulsar tracks acks per message. Consumer 3 can finish message 47 while consumer 1 is still working on message 12. Neither blocks the other. For job workloads where execution time varies wildly (a 50ms API call next to a 10-minute video transcode), this changes everything.

Delayed and scheduled delivery. Kafka has no concept of "deliver this later." Systems that need retry backoff or scheduled execution build their own delay mechanism on top: sorted sets in Redis, database polling loops, delay topics with consumer pauses. Pulsar handles it at the broker. deliverAfter(30s) for retry backoff. deliverAt(2026-04-16T09:00:00Z) for cron-scheduled work. The message is invisible to consumers until the deadline passes. One fewer moving part.

Dead-letter topics. When a consumer nacks a message enough times, Pulsar moves it to a dead-letter topic. Configurable per subscription: max redeliver count, DLT name, initial subscription. Kafka leaves this entirely to the application. Most teams end up building a DLQ state machine that tracks attempt counts in a database and moves failed records manually. Pulsar does it in one line of subscription config.

The catch. All of this comes wrapped in a system that needs ZooKeeper, BookKeeper, and brokers running together. Kafka 4.0 removed ZooKeeper entirely. Pulsar still requires it (or its metadata store abstraction). The operational gap is real and should not be handwaved. These features earn their cost only at scale or when the workload genuinely needs them.

Production Deployment

Minimum production setup: 3 ZooKeeper nodes, 3 BookKeeper bookies, 2+ brokers. That is 8 processes at minimum, and honestly more bookies are needed for any real workload. Use dedicated NVMe SSDs for BookKeeper journals and put them on separate disks from ledger storage. Journal latency directly controls write latency, so do not compromise here. Give BookKeeper about 25% of available RAM for direct memory (write cache and read cache). Set ensemble size (E) = 3, write quorum (Qw) = 3, and ack quorum (Qa) = 2 as a starting point.

Key metrics to watch: pulsar_broker_topics_count, pulsar_subscription_back_log_size (backlog per subscription, the most important one), bookkeeper_server_ADD_ENTRY_LATENCY (write latency), bookkeeper_server_READ_ENTRY_LATENCY, and pulsar_broker_publish_latency. Set alerts when backlog crosses the SLO threshold or BookKeeper write latency exceeds 10ms. Without backlog size monitoring, slow consumers only surface when BookKeeper storage fills up and the cluster falls over.

Yahoo built Pulsar and runs it at millions of messages per second. Tencent operates it with 10M+ topics. Splunk uses it as their event streaming backbone. These are real deployments, but keep in mind these are also companies with large infrastructure teams. If the team is three engineers, think carefully about whether it is realistic to operate this thing.

Pros

• Brokers are stateless, BookKeeper handles storage. You can scale them independently, and replacing a dead broker takes seconds, not hours.
• Multi-tenancy is a first-class concept. Tenant, namespace, topic hierarchy gives you per-namespace quotas and policies out of the box.
• Geo-replication is built into the protocol. One admin command to set it up, no external tools needed.
• Tiered storage moves old ledger segments to object storage automatically. Infinite retention without burning SSD budget.
• Supports queuing (shared subscription) and streaming (exclusive/failover) in the same system, so you don't need two different platforms.

Cons

• Operationally heavy. You need ZooKeeper + BookKeeper + Brokers. That is a minimum of 9 processes before you even publish a message.
• The ecosystem is smaller than Kafka's. Fewer connectors, fewer managed offerings, fewer Stack Overflow answers.
• Simple streaming workloads see higher tail latency compared to Kafka because of the extra BookKeeper hop.
• Schema registry and exactly-once semantics still lag behind Kafka's implementations in maturity.
• Managed cloud options are limited. Kafka has Confluent, MSK, Aiven, and more. Pulsar has StreamNative and not much else.

When to use

• You actually need multi-tenancy with hard isolation between teams or customers
• Geo-replication is a real requirement, not just a nice-to-have
• You want queues and streaming topics in one system and are tired of running both RabbitMQ and Kafka
• You need to retain messages for months or years without paying for SSD-tier storage the whole time

When NOT to use

• Single-cluster streaming where Kafka works fine and has a decade of battle scars to prove it
• Your team wants something simpler to operate. Kafka is fewer moving parts.
• You already have deep Kafka Connect integrations and ecosystem tooling
• Community support matters a lot to you. Kafka's community is 5-10x larger.

Key Points

•Brokers are stateless. They read from and write to BookKeeper, so a failed broker gets replaced instantly. No local data to migrate, no rebalancing dance.
•BookKeeper stores messages in ledgers (append-only log segments) striped across multiple bookies. With a write quorum of 3 and ack quorum of 2, a write is confirmed once 2 out of 3 bookies acknowledge.
•Four subscription modes cover different patterns: Exclusive (one consumer, strict ordering), Failover (active-standby, ordered), Shared (round-robin across consumers, unordered), and Key_Shared (same key always goes to same consumer, partially ordered).
•Tiered storage offloads old ledger segments to S3, GCS, or Azure Blob. The result is effectively infinite retention at roughly $0.02/GB/month instead of $0.10/GB/month on SSDs.
•Geo-replication works at the protocol level. Configuring replication between clusters is a single admin command. Messages replicate asynchronously, with replication cursors tracking per-cluster progress.
•Shared subscriptions break the partition ceiling. In Kafka, max parallelism equals the partition count. Period. In Pulsar, 200 consumers can share one subscription on one topic and each one pulls work independently. Adding a worker always adds throughput. No rebalancing, no partition math.
•Per-message ack and nack. A consumer acknowledges or negatively acknowledges each message on its own. A slow job does not block the rest of the partition. A failed job gets redelivered with a configurable delay while everything else keeps moving.
•Delayed and timed delivery at the broker level. deliverAfter(duration) holds a message for a relative delay (useful for retry backoff). deliverAt(timestamp) holds it until an exact wall-clock time (useful for cron: publish with deliverAt(nextFireTime) and skip the polling loop entirely). No external scheduler needed.
•Native dead-letter topics. After a configurable number of nacks, the broker moves the message to a dead-letter topic automatically. No custom DLQ state machine, no manual tracking of retry counts in a database column.

Common Mistakes

✗Under-provisioning BookKeeper. Pulsar's write throughput is gated entirely by BookKeeper performance. Run at least 3 bookies with NVMe SSDs, and put journal and ledger on separate disks. Skipping this is the number one cause of production pain.
✗Not setting retention policies. By default, Pulsar deletes messages as soon as all subscriptions acknowledge them. Without an explicit retention policy, acknowledged messages are gone forever and cannot be replayed.
✗Using shared subscriptions when ordering is needed. Shared subscriptions distribute messages round-robin with zero ordering guarantees. Use Key_Shared for per-key ordering or Exclusive for total ordering.
✗Ignoring backlog quotas. Without producer_request_hold or producer_exception policies, a slow consumer builds an unbounded backlog that fills BookKeeper storage and eventually crashes the cluster.
✗Co-locating ZooKeeper with BookKeeper on the same nodes. ZooKeeper needs low-latency disk I/O for metadata. BookKeeper hammers the disk with heavy writes. Put them together and the result is ZK session timeouts and cascading instability.

Related Technologies