By Sahil Kapoor - 27 Apr 2026

Apache Kafka

Apache Kafka is a distributed event streaming platform: a high-throughput, append-only log that producers write to and consumers read from at their own pace. Kafka is the de facto backbone for event-driven architectures, real-time analytics pipelines, and inter-service messaging at scale.

Core concepts

Topic. A named, ordered log of records, split into one or more partitions.
Partition. The unit of parallelism; records in one partition are strictly ordered.
Producer. Writes records to topics, optionally with a key that controls partition assignment.
Consumer group. A set of consumers that share a topic; each partition is consumed by exactly one consumer in the group at a time.
Offset. A consumer's position in a partition; stored in Kafka so consumers resume after restarts.
Brokers and replicas. Cluster nodes that store partitions; partitions are replicated for durability.

Common use cases

Event-driven microservices: producers publish domain events, many services subscribe
Log and clickstream ingestion before warehousing in Snowflake, BigQuery, or Redshift
Change-data-capture (CDC) via Debezium streaming database changes into Kafka
Stream processing with Kafka Streams, Flink, ksqlDB

Alternatives

Redpanda (Kafka-API compatible, simpler operations), AWS Kinesis, Google Pub/Sub, Azure Event Hubs, NATS JetStream, Apache Pulsar.

🔗

Core concepts

Common use cases

Alternatives

Subscribe to Sahil's Playbook