Apache Kafka

Apache Kafka is a distributed event streaming platform: a high-throughput, append-only log that producers write to and consumers read from at their own pace. Kafka is the de facto backbone for event-driven architectures, real-time analytics pipelines, and inter-service messaging at scale.

Core concepts

  • Topic. A named, ordered log of records, split into one or more partitions.
  • Partition. The unit of parallelism; records in one partition are strictly ordered.
  • Producer. Writes records to topics, optionally with a key that controls partition assignment.
  • Consumer group. A set of consumers that share a topic; each partition is consumed by exactly one consumer in the group at a time.
  • Offset. A consumer's position in a partition; stored in Kafka so consumers resume after restarts.
  • Brokers and replicas. Cluster nodes that store partitions; partitions are replicated for durability.

Common use cases

  • Event-driven microservices: producers publish domain events, many services subscribe
  • Log and clickstream ingestion before warehousing in Snowflake, BigQuery, or Redshift
  • Change-data-capture (CDC) via Debezium streaming database changes into Kafka
  • Stream processing with Kafka Streams, Flink, ksqlDB

Alternatives

Redpanda (Kafka-API compatible, simpler operations), AWS Kinesis, Google Pub/Sub, Azure Event Hubs, NATS JetStream, Apache Pulsar.

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon