Apache Kafka
Apache Kafka is a distributed event streaming platform: a high-throughput, append-only log that producers write to and consumers read from at their own pace. Kafka is the de facto backbone for event-driven architectures, real-time analytics pipelines, and inter-service messaging at scale.
Core concepts
- Topic. A named, ordered log of records, split into one or more partitions.
- Partition. The unit of parallelism; records in one partition are strictly ordered.
- Producer. Writes records to topics, optionally with a key that controls partition assignment.
- Consumer group. A set of consumers that share a topic; each partition is consumed by exactly one consumer in the group at a time.
- Offset. A consumer's position in a partition; stored in Kafka so consumers resume after restarts.
- Brokers and replicas. Cluster nodes that store partitions; partitions are replicated for durability.
Common use cases
- Event-driven microservices: producers publish domain events, many services subscribe
- Log and clickstream ingestion before warehousing in Snowflake, BigQuery, or Redshift
- Change-data-capture (CDC) via Debezium streaming database changes into Kafka
- Stream processing with Kafka Streams, Flink, ksqlDB
Alternatives
Redpanda (Kafka-API compatible, simpler operations), AWS Kinesis, Google Pub/Sub, Azure Event Hubs, NATS JetStream, Apache Pulsar.
🔗