// event streaming · partitions · consumer groups · delivery semantics · stream processing · senior → principal
payments). Each topic is split into partitions — ordered, immutable sequences of records. Each record has an offset, its unique position within a partition. Ordering is guaranteed within a partition; none across partitions. Consumers track their own offset — they can replay from any point.
retention.ms or beyond retention.bytes. Use when you care about the event stream up to a time window.
Log compaction — retain only the latest value per key. Tombstone (null-value) records signal deletion. Use for changelog topics and materializing current state.
CooperativeStickyAssignor. Monitor rebalance frequency — frequent rebalancing is a consumer stability problem, not a Kafka problem.
max.poll.records, optimizing processing, or moving slow work off the poll loop.
latest misses all historical data and starts only from the moment it's deployed. Teams often don't notice until days later. For any consumer that needs a complete view, use earliest. Always verify committed offsets exist before deploying new consumer groups to production.
| acks | 0 = none, 1 = leader, all = full ISR. Use `all` for durability. |
| enable.idempotence | true — broker deduplicates retries using PID + sequence number. |
| retries | Set high with idempotence. Default MAX_INT in modern clients. |
| linger.ms | Wait before flushing a batch. Higher = better throughput, more latency. |
| batch.size | Max bytes per batch per partition. Larger batches = better compression. |
| compression.type | none / gzip / snappy / lz4 / zstd. snappy or lz4 for throughput; zstd for ratio. |
| max.in.flight.requests | Keep ≤5 with idempotence. >1 without idempotence risks reordering on retry. |
| buffer.memory | Total bytes the producer can buffer before blocking. Tune for burst traffic. |
| auto.offset.reset | earliest (full replay) / latest (from now) / none (throw if no offset). Set intentionally. |
| enable.auto.commit | false for correctness. Commit manually after processing. |
| max.poll.interval.ms | Max time between poll() calls. Processing must complete within this window. |
| max.poll.records | Max records per poll. Tune down if processing is slow. |
| session.timeout.ms | Max time without heartbeat before broker considers consumer dead. |
| heartbeat.interval.ms | How often the consumer sends a heartbeat. ~1/3 of session timeout. |
| isolation.level | read_committed — only see committed transactions. Required for EOS consumers. |
| replication.factor | 3 in production. 1 = no fault tolerance. |
| min.insync.replicas | 2 with RF=3 is the standard safety setting. |
| retention.ms | How long to keep messages. Default 7 days. Set per topic. |
| cleanup.policy | delete (time/size retention) or compact (keep latest per key). |
| unclean.leader.election | false (default). Never elect an out-of-ISR replica — avoids data loss. |
| replica.lag.time.max.ms | How long before a straggling replica is dropped from ISR. |
| Dimension | Apache Kafka | RabbitMQ | Amazon SQS | Apache Pulsar |
|---|---|---|---|---|
| Throughput | Very high (millions/sec) | Medium (tens of thousands/sec) | Medium (managed, auto-scales) | Very high — comparable to Kafka |
| Message ordering | Guaranteed within a partition | Per queue (single consumer) | No guarantee (FIFO queue: per group ID) | Guaranteed within a partition |
| Retention & replay | Configurable days/weeks. Full replay anytime. | Consumed = deleted. No replay. | Consumed = deleted. Max 14-day retention. | Tiered storage — near-unlimited retention. |
| Consumer model | Pull. Consumer groups share partitions. | Push or pull. Competing consumers. | Pull (long-poll). Competing consumers. | Push + pull. Shared, failover, key-shared. |
| Delivery semantics | At-least-once default. Exactly-once via transactions. | At-least-once with manual acks. | At-least-once. Standard queue can duplicate. | At-least-once default. Exactly-once with transactions. |
| Ops complexity | High — cluster management, partition tuning. | Medium — easier at smaller scale. | Very low — fully managed by AWS. | High — BookKeeper adds operational complexity. |
| Best for | Event streaming, audit logs, multi-consumer fan-out. | Task queues, work distribution, complex routing. | Cloud-native decoupling, serverless triggers. | Kafka replacement with tiered storage, multi-tenancy. |
max.poll.interval.ms exceeded or missed heartbeat), or partition count changes. During an eager rebalance (default pre-2.4), all consumers stop consuming, drop all partitions, and wait for the group coordinator to reassign — a full stop-the-world pause. Cooperative rebalancing (CooperativeStickyAssignor) only revokes and moves the partitions that need to change, leaving the rest untouched. Minimize impact: use cooperative rebalancing, tune session.timeout.ms and heartbeat.interval.ms appropriately, and ensure max.poll.records is low enough that processing finishes within max.poll.interval.ms.max.poll.interval.ms, the real fix is reducing batch size or moving slow processing off the poll loop — not raising the timeout.enable.idempotence=true, the broker assigns each producer a PID (Producer ID) and tracks a sequence number per partition. If a message arrives with the same PID and sequence number as one already written, the broker silently deduplicates it. This gives exactly-once semantics at the producer-to-broker level. Modern Kafka clients enable idempotence automatically when acks=all and retries is high.replica.lag.time.max.ms (default 30s) — typically due to being overloaded, slow disk, or a network partition. min.insync.replicas defines the minimum ISR members that must acknowledge a write before the leader returns success (when acks=all). If ISR drops below min.insync.replicas, the partition becomes unavailable for writes — producers receive NotEnoughReplicasException.replication.factor, min.insync.replicas, and acks=all is how you tune the availability vs. durability dial. Standard production setup: RF=3, min.insync.replicas=2, acks=all — you tolerate one broker failure while still requiring two replicas to ack every write. You lose write availability before you lose data. Setting min.insync.replicas=1 restores availability but loses the durability guarantee. Know this triangle cold — it comes up in every Kafka reliability discussion.delete.retention.ms so consumers can observe deletions. Use compaction when you care about current state, not history: entity state changes, user settings, feature flags, CDC events where only the latest row matters.log.cleaner.min.dirty.ratio, default 0.5) is reached, so the active segment is never compacted and can still contain duplicates. Tombstones are retained after compaction for delete.retention.ms — consumers must read them or they'll miss deletes during bootstrap. If you're using a compacted topic to rebuild state (Kafka Streams, cache reload), test the bootstrap path explicitly — it's where compaction-based systems quietly lose data.kafka-consumer-groups.sh --describe, Datadog Kafka integration, Burrow.payments, processing fails → write to payments.DLQ with headers capturing original topic, partition, offset, error type, retry count, and timestamp → commit the offset on the main topic so the consumer advances. A separate DLQ consumer handles alerting, inspection, and reprocessing. Committing the offset is critical — without it, you loop on the same record forever.max.poll.interval.ms is the maximum time a consumer can go between poll() calls. If processing a batch takes longer than this, the coordinator considers the consumer dead, triggers a rebalance, and reassigns the partition to another consumer — which starts from the last committed offset, picks up the same slow batch, and also times out. The result is a rebalance loop with no progress. Root causes: batch size too large (max.poll.records), individual record processing is expensive, or GC pauses interrupting the poll loop. Don't just raise the timeout — that masks the problem.max.poll.records; move slow downstream I/O to an async worker with careful offset management; or redesign the pipeline — consume fast into an internal queue, let a thread pool process, commit after confirmed completion. If the bottleneck is an external API with rate limits, consider whether a task queue is better suited than Kafka for this consumer.acks=all means "durable as long as ISR is healthy." If ISR shrinks below min.insync.replicas, producers block with NotEnoughReplicasException — this is intended behavior, trading availability for safety. Design your producer error handling for this: should producers block and retry (tolerate backpressure), circuit-break and write to a local buffer, or fail fast to the caller? The answer depends on whether availability or consistency is the stronger requirement for that topic. Make it an explicit decision.transactional.id, coordinates with a transaction coordinator broker, and uses a two-phase commit. Consumers with isolation.level=read_committed only see committed transaction messages. In a consume-transform-produce pattern (Kafka Streams), the consumer offset commit and producer writes can happen in the same transaction — giving exactly-once for Kafka-to-Kafka flows.auto.offset.reset only applies when a consumer group has no committed offsets for a partition — either a brand new group, or a group whose committed offsets have expired. earliest: start from offset 0 (full replay). latest: start from the current end (miss all history). none: throw NoOffsetForPartitionException. A group with existing committed offsets ignores this setting entirely. Dangerous scenario: a consumer group's committed offsets expire (default offset retention is 7 days). On restart, the group has no offsets, auto.offset.reset=latest kicks in, and the consumer silently starts from the current end — missing potentially days of events.auto.offset.reset=latest is the default in many framework setups and is a footgun for teams building event-sourced or stateful systems. For any consumer that builds state from the event history, always use earliest and test the bootstrap path explicitly. Operationally: before deploying a new consumer group to production, verify committed offsets exist with kafka-consumer-groups.sh --describe; if they don't, seed them with --reset-offsets --to-earliest before starting the consumer. Make this a deployment checklist item.enable.idempotence=true. The consumer uses manual offset commit. Processing pattern: read record → upsert to database using the Kafka offset or a business key as the idempotency key → commit offset. An upsert (INSERT ON CONFLICT UPDATE) ensures that reprocessing the same record produces the same database state. The database transaction and the Kafka offset commit are separate — there's an at-least-once gap — which is why the upsert must be idempotent.processed_event_id. On restart, the consumer queries this table before processing — if the event ID is already there, skip and commit the offset. This moves deduplication into the database layer, which is transactionally consistent. For high-throughput sinks, bloom filters or in-memory deduplication windows can replace the database check for recent events.payments.retry-1 (retry after 1m), payments.retry-2 (after 5m), payments.retry-3 (after 30m), then payments.DLQ. Each retry topic is consumed by a worker that waits the appropriate delay before reprocessing. Headers on DLQ records carry: original topic, partition, offset, error cause, attempt count, timestamps.kafka-leader-election.sh --election-type preferred — triggers preferred leader election, redistributing leaders to their original assignment without moving data. For persistent imbalance, use kafka-reassign-partitions.sh to move partition replicas to underutilized brokers.--throttle to avoid impacting producers and consumers. Longer-term: enable auto.leader.rebalance so leaders redistribute automatically after broker restarts. The root cause of persistent hot brokers is usually initial topic creation without considering broker distribution — address this in your topic provisioning process.kafka-acls.sh grant DESCRIBE, READ, WRITE, CREATE per principal per topic/group. More fine-grained: Confluent RBAC or Ranger.
Encryption: TLS (SSL listener) for data in transit between clients and brokers. Encryption at rest is handled at the disk/OS layer, not by Kafka natively.kafka-configs.sh --entity-type clients --entity-name producer-id --add-config producer_byte_rate=...) without restarting it. Check if the surge is intentional (a backfill job) or a bug. Scale out consumers that are lagging (up to partition count). If specific broker partitions are stressed, check broker disk I/O — high-retention topics competing for disk bandwidth. Prevent cascading to unrelated groups: separate high-throughput, bursty topics onto their own broker rack or cluster so a surge doesn't contend for shared broker resources.kafka-consumer-groups.sh --describe — look for partitions showing LAG=∞ or CONSUMER-ID=- (no active consumer assigned)