Apache Kafka: The Complete Guide for 2026

Published February 12, 2026 · 30 min read

Apache Kafka is the backbone of event-driven architectures at companies processing billions of events per day. Whether you are building real-time analytics pipelines, microservice communication layers, or change data capture systems, Kafka provides the durable, high-throughput, fault-tolerant messaging infrastructure you need. This guide covers everything from core concepts to production deployment.

What Is Kafka and When to Use It
Core Concepts
Architecture: ZooKeeper vs KRaft
Setting Up Kafka with Docker Compose
Producing Messages
Consuming Messages
Topic Management
Configuration Tuning
Kafka Connect
Kafka Streams
Schema Registry
Exactly-Once Semantics
Monitoring
Production Best Practices
Kafka vs Alternatives
Troubleshooting
FAQ

1. What Is Kafka and When to Use It

Apache Kafka is a distributed event streaming platform. Unlike traditional message brokers that delete messages after delivery, Kafka persists events to disk in an ordered, immutable log. This means multiple consumers can read the same data independently, and you can replay events from any point in time.

Use Kafka when you need:

Event streaming — real-time data pipelines between systems
Event sourcing — recording every state change as an immutable event
Log aggregation — collecting logs from many services into a central platform
Change data capture (CDC) — streaming database changes to downstream systems
Metrics collection — ingesting high-volume telemetry data
Decoupling microservices — asynchronous communication without tight coupling

Do not use Kafka when: you need simple request-reply messaging, your throughput is under 1,000 messages per second (a lighter broker like RabbitMQ or Redis Streams is simpler), or you need complex routing logic like fanout exchanges with per-message routing keys.

2. Core Concepts

Topics are named feeds of messages. A topic called user-events might hold every click, signup, and purchase event. Topics are append-only logs, not queues — messages are not deleted after consumption.

Partitions are the unit of parallelism. Each topic is split into one or more partitions. Messages within a partition are strictly ordered. Producers can direct messages to specific partitions using a key, ensuring all events for the same entity land in the same partition and are processed in order.

Brokers are the Kafka server processes. A cluster typically runs 3 or more brokers for fault tolerance. Each partition has one leader broker that handles all reads and writes, and one or more follower brokers that replicate the data.

Producers write messages to topics. They choose which partition to send each message to, either explicitly, by key hash, or round-robin.

Consumers read messages from topics. Each consumer tracks its position (offset) in each partition.

Consumer Groups enable parallel consumption. Kafka assigns each partition to exactly one consumer in the group. If you have 6 partitions and 3 consumers in the same group, each consumer handles 2 partitions. Adding a 4th consumer triggers a rebalance. Having more consumers than partitions means some consumers sit idle.

Offsets are sequential IDs for each message in a partition. Consumers commit offsets to track which messages they have processed. This is how Kafka enables replay: reset the offset to an earlier position and reprocess events.

3. Architecture: ZooKeeper vs KRaft

Historically, Kafka relied on Apache ZooKeeper to manage cluster metadata: broker registration, topic configurations, partition leadership, and consumer group coordination. This added operational complexity since you had to deploy and monitor a separate ZooKeeper ensemble (typically 3 or 5 nodes).

KRaft mode (Kafka Raft) eliminates ZooKeeper entirely. Kafka brokers manage metadata internally using the Raft consensus protocol. A subset of brokers act as controllers that maintain the metadata log.

# KRaft advantages over ZooKeeper:
# - No separate ZooKeeper cluster to deploy and monitor
# - Faster partition leadership changes (milliseconds vs seconds)
# - Supports millions of partitions per cluster
# - Simpler configuration and deployment
# - Single security model (no separate ZK authentication)

# ZooKeeper support was removed in Kafka 4.0
# All new deployments should use KRaft mode

4. Setting Up Kafka with Docker Compose (KRaft Mode)

The modern approach uses KRaft mode, eliminating the ZooKeeper dependency. This docker-compose.yml gives you a single-broker development cluster.

services:
  kafka:
    image: apache/kafka:3.8
    container_name: kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk
    volumes:
      - kafka-data:/var/lib/kafka/data
    healthcheck:
      test: ["CMD-SHELL", "kafka-broker-api-versions.sh --bootstrap-server localhost:9092"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s

volumes:
  kafka-data:

For a three-broker production-like cluster:

services:
  kafka-1:
    image: apache/kafka:3.8
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
      KAFKA_DEFAULT_REPLICATION_FACTOR: 3
      KAFKA_MIN_INSYNC_REPLICAS: 2
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

  kafka-2:
    image: apache/kafka:3.8
    environment:
      KAFKA_NODE_ID: 2
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

  kafka-3:
    image: apache/kafka:3.8
    environment:
      KAFKA_NODE_ID: 3
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-3:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

5. Producing Messages

# Create a topic first
kafka-topics.sh --bootstrap-server localhost:9092 \
  --create --topic user-events --partitions 6 --replication-factor 1

# Produce simple messages (type and press Enter for each message)
kafka-console-producer.sh --bootstrap-server localhost:9092 \
  --topic user-events
> Hello Kafka
> This is a second message

# Produce key-value pairs (messages with the same key go to the same partition)
kafka-console-producer.sh --bootstrap-server localhost:9092 \
  --topic user-events \
  --property parse.key=true \
  --property key.separator=:
> user-123:{"event":"login","ts":"2026-02-12T10:00:00Z"}
> user-456:{"event":"signup","ts":"2026-02-12T10:01:00Z"}
> user-123:{"event":"purchase","ts":"2026-02-12T10:05:00Z"}

Key-based partitioning ensures all events for user-123 land in the same partition and are consumed in order. This is critical for maintaining per-entity event ordering.

6. Consuming Messages

# Consume new messages (waits for incoming messages)
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic user-events

# Consume from the beginning of the topic
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic user-events --from-beginning

# Consume with keys and timestamps displayed
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic user-events --from-beginning \
  --property print.key=true \
  --property print.timestamp=true

# Consume as part of a consumer group
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic user-events --group my-app-group

# Check consumer group status and lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group my-app-group

7. Topic Management

# Create a topic with specific settings
kafka-topics.sh --bootstrap-server localhost:9092 \
  --create --topic orders \
  --partitions 12 \
  --replication-factor 3 \
  --config retention.ms=604800000 \
  --config cleanup.policy=delete

# List all topics
kafka-topics.sh --bootstrap-server localhost:9092 --list

# Describe a topic (shows partitions, replicas, ISR)
kafka-topics.sh --bootstrap-server localhost:9092 \
  --describe --topic orders

# Alter topic configuration
kafka-configs.sh --bootstrap-server localhost:9092 \
  --alter --entity-type topics --entity-name orders \
  --add-config retention.ms=259200000

# Increase partitions (cannot decrease)
kafka-topics.sh --bootstrap-server localhost:9092 \
  --alter --topic orders --partitions 24

# Delete a topic
kafka-topics.sh --bootstrap-server localhost:9092 \
  --delete --topic orders

8. Configuration Tuning

Retention

# Time-based retention (default: 7 days)
log.retention.hours=168

# Size-based retention (per partition)
log.retention.bytes=1073741824    # 1 GB per partition

# Compacted topics (keep latest value per key)
cleanup.policy=compact
min.compaction.lag.ms=3600000     # don't compact messages younger than 1 hour

# Delete + compact (hybrid)
cleanup.policy=compact,delete

Replication

# Replication factor (set at topic creation)
default.replication.factor=3

# Minimum in-sync replicas required for writes
min.insync.replicas=2
# With replication.factor=3 and min.insync.replicas=2,
# you can lose 1 broker and still accept writes

# Producer acknowledgment
acks=all    # wait for all in-sync replicas (strongest guarantee)

Producer Tuning

# Batching: accumulate messages before sending
batch.size=65536               # 64 KB batch
linger.ms=10                   # wait up to 10ms to fill a batch

# Compression (reduces network and storage)
compression.type=lz4           # options: none, gzip, snappy, lz4, zstd

# Retries and idempotence
retries=2147483647             # effectively infinite
enable.idempotence=true        # prevent duplicates on retry
max.in.flight.requests.per.connection=5  # safe with idempotence enabled

Consumer Tuning

# Fetch tuning
fetch.min.bytes=1024           # wait for at least 1KB of data
fetch.max.wait.ms=500          # or wait at most 500ms
max.poll.records=500           # max records per poll()

# Session and heartbeat
session.timeout.ms=45000       # consumer considered dead after 45s
heartbeat.interval.ms=15000    # send heartbeat every 15s
max.poll.interval.ms=300000    # max time between poll() calls

9. Kafka Connect

Kafka Connect is a framework for streaming data between Kafka and external systems without writing code. Source connectors pull data into Kafka; sink connectors push data from Kafka to external systems.

# Example: JDBC Source Connector (database to Kafka)
{
  "name": "postgres-source",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://db:5432/myapp",
    "connection.user": "kafka_reader",
    "connection.password": "${file:/secrets/db-password.txt:password}",
    "table.whitelist": "orders,customers",
    "mode": "timestamp+incrementing",
    "timestamp.column.name": "updated_at",
    "incrementing.column.name": "id",
    "topic.prefix": "db.",
    "poll.interval.ms": 5000,
    "transforms": "route",
    "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.route.regex": "(.*)",
    "transforms.route.replacement": "cdc.$1"
  }
}

# Example: Elasticsearch Sink Connector (Kafka to Elasticsearch)
{
  "name": "elasticsearch-sink",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "topics": "db.orders",
    "connection.url": "http://elasticsearch:9200",
    "type.name": "_doc",
    "key.ignore": false,
    "schema.ignore": true,
    "behavior.on.malformed.documents": "warn"
  }
}

# Deploy a connector via the REST API
curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @postgres-source.json

# Check connector status
curl http://localhost:8083/connectors/postgres-source/status

# List all connectors
curl http://localhost:8083/connectors

# Restart a failed task
curl -X POST http://localhost:8083/connectors/postgres-source/tasks/0/restart

10. Kafka Streams

Kafka Streams is a Java library for building stream processing applications that read from and write to Kafka. Unlike Spark or Flink, it requires no separate cluster — it runs inside your application as a library.

// Word count example with Kafka Streams
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "word-count");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> input = builder.stream("text-input");

KTable<String, Long> wordCounts = input
    .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
    .groupBy((key, word) -> word)
    .count(Materialized.as("word-counts-store"));

wordCounts.toStream().to("word-counts-output",
    Produced.with(Serdes.String(), Serdes.Long()));

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Windowed Aggregations

// Count events per user in 5-minute tumbling windows
KStream<String, String> events = builder.stream("user-events");

KTable<Windowed<String>, Long> windowedCounts = events
    .groupByKey()
    .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
    .count(Materialized.as("windowed-counts"));

// Hopping windows: 10-minute window, advancing every 1 minute
TimeWindows hoppingWindow = TimeWindows
    .ofSizeWithNoGrace(Duration.ofMinutes(10))
    .advanceBy(Duration.ofMinutes(1));

11. Schema Registry and Serialization

Schema Registry provides a centralized repository for message schemas, enabling schema evolution without breaking consumers. It supports Avro, Protobuf, and JSON Schema.

# Register an Avro schema
curl -X POST http://localhost:8081/subjects/user-events-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{
    "schema": "{\"type\":\"record\",\"name\":\"UserEvent\",\"fields\":[{\"name\":\"userId\",\"type\":\"string\"},{\"name\":\"event\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"long\"}]}"
  }'

# Check compatibility before evolving a schema
curl -X POST http://localhost:8081/compatibility/subjects/user-events-value/versions/latest \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{
    "schema": "{\"type\":\"record\",\"name\":\"UserEvent\",\"fields\":[{\"name\":\"userId\",\"type\":\"string\"},{\"name\":\"event\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"long\"},{\"name\":\"source\",\"type\":[\"null\",\"string\"],\"default\":null}]}"
  }'

# List all subjects
curl http://localhost:8081/subjects

# Get latest schema for a subject
curl http://localhost:8081/subjects/user-events-value/versions/latest

Compatibility modes: BACKWARD (new schema can read old data, default), FORWARD (old schema can read new data), FULL (both directions), and NONE (no compatibility check). Use BACKWARD for consumers that must handle historical data, and FULL for maximum safety.

12. Exactly-Once Semantics

# Producer configuration for exactly-once
enable.idempotence=true
acks=all
retries=2147483647
max.in.flight.requests.per.connection=5

# Transactional producer (Java example)
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "order-processor-1");
props.put("enable.idempotence", "true");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions();

try {
    producer.beginTransaction();
    producer.send(new ProducerRecord<>("orders", key, value));
    producer.send(new ProducerRecord<>("audit-log", key, auditValue));
    // Commit consumer offsets as part of the transaction
    producer.sendOffsetsToTransaction(offsets, consumerGroupMetadata);
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

For Kafka Streams, set processing.guarantee=exactly_once_v2 to enable end-to-end exactly-once processing across the entire consume-transform-produce pipeline.

13. Monitoring

Key JMX Metrics

# Broker metrics
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
kafka.controller:type=KafkaController,name=ActiveControllerCount
kafka.controller:type=KafkaController,name=OfflinePartitionsCount

# Producer metrics
kafka.producer:type=producer-metrics,name=record-send-rate
kafka.producer:type=producer-metrics,name=record-error-rate

# Consumer metrics
kafka.consumer:type=consumer-fetch-manager-metrics,name=records-lag-max
kafka.consumer:type=consumer-fetch-manager-metrics,name=fetch-rate

Consumer Lag Monitoring

# Check lag for all consumer groups
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --all-groups

# Output columns: GROUP, TOPIC, PARTITION, CURRENT-OFFSET, LOG-END-OFFSET, LAG

# Reset offsets (use with caution)
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group my-app-group --topic user-events \
  --reset-offsets --to-earliest --execute

# Reset to specific timestamp
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group my-app-group --topic user-events \
  --reset-offsets --to-datetime 2026-02-12T00:00:00.000 --execute

Prometheus + JMX Exporter

# Add JMX Exporter agent to Kafka broker startup
KAFKA_OPTS="-javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent.jar=7071:/opt/jmx_exporter/kafka-broker.yml"

# Minimal kafka-broker.yml for Prometheus
lowercaseOutputName: true
rules:
  - pattern: kafka.server<type=(.+), name=(.+)><>Value
    name: kafka_server_$1_$2
  - pattern: kafka.server<type=(.+), name=(.+), topic=(.+)><>Value
    name: kafka_server_$1_$2
    labels:
      topic: $3

⚙ Related: Validate your Kafka JSON configurations with our JSON Formatter.

14. Production Best Practices

Partitioning strategy: Use meaningful keys (user ID, order ID) for messages that need ordering guarantees. Use null keys for maximum throughput when ordering does not matter. Avoid hot partitions by choosing keys with high cardinality.

Replication factor: Always use replication.factor=3 in production. Combined with min.insync.replicas=2, this allows one broker failure without data loss or availability impact.

ISR (In-Sync Replicas): Monitor UnderReplicatedPartitions closely. If a follower falls behind, it leaves the ISR. When the ISR shrinks below min.insync.replicas, the partition becomes read-only. Common causes: slow disks, network issues, or an overloaded broker.

Disk: Use dedicated SSDs for Kafka log directories. Kafka is sequential-I/O heavy, so throughput matters more than latency. Use multiple log directories on separate disks for higher throughput: log.dirs=/data/kafka-1,/data/kafka-2.

Memory: Kafka relies heavily on the OS page cache. Allocate 6–8 GB for the JVM heap and leave the rest for page cache. A broker with 64 GB of RAM should use -Xmx8g and let the OS cache the remaining 56 GB of log data.

Network: Kafka is network-bound in most deployments. Use 10 Gbps network interfaces and tune socket.send.buffer.bytes and socket.receive.buffer.bytes for high-throughput scenarios.

15. Kafka vs RabbitMQ vs Redis Streams vs Pulsar

Feature	Kafka	RabbitMQ	Redis Streams	Pulsar
Model	Distributed log	Message broker	In-memory stream	Distributed log
Throughput	Millions/sec	Tens of thousands/sec	Hundreds of thousands/sec	Millions/sec
Message replay	Yes (offset reset)	No (deleted after ack)	Yes (limited by memory)	Yes (offset reset)
Ordering	Per-partition	Per-queue	Per-stream	Per-partition
Storage	Disk (tiered)	Disk + memory	Memory (+ AOF)	BookKeeper (tiered)
Stream processing	Kafka Streams, ksqlDB	None built-in	None built-in	Pulsar Functions
Best for	Event streaming, CDC, high volume	Task queues, routing, RPC	Lightweight streaming, caching	Multi-tenant, geo-replication

16. Troubleshooting

Under-Replicated Partitions

# Check for under-replicated partitions
kafka-topics.sh --bootstrap-server localhost:9092 \
  --describe --under-replicated-partitions

# Common causes:
# 1. Broker down or unreachable — check broker logs and network
# 2. Slow disk I/O — check disk utilization with iostat
# 3. Broker overloaded — check CPU and network bandwidth
# 4. Large messages — check max.message.bytes settings

# Reassign partitions to balance load
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
  --reassignment-json-file reassignment.json --execute

Consumer Lag Growing

# Diagnose consumer lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group my-app-group

# Solutions:
# 1. Add more consumers (up to the number of partitions)
# 2. Increase max.poll.records and optimize processing logic
# 3. Increase partitions and consumers together
# 4. Check for slow external calls (DB, HTTP) in consumer code
# 5. Use async processing: poll fast, process in background threads

Consumer Group Rebalancing

# Frequent rebalancing causes:
# 1. Consumer taking too long to process — increase max.poll.interval.ms
# 2. Heartbeat timeout — increase session.timeout.ms
# 3. Consumer instances starting/stopping — stabilize deployment

# Use static group membership to reduce rebalancing
group.instance.id=consumer-host-1    # unique per consumer instance
session.timeout.ms=60000             # longer timeout for static members

# Use cooperative rebalancing (incremental, less disruptive)
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor

Frequently Asked Questions

What is the difference between Kafka and RabbitMQ?

Kafka is a distributed log designed for high-throughput event streaming where messages are retained on disk and can be replayed. RabbitMQ is a traditional message broker that routes messages to consumers and deletes them after acknowledgment. Use Kafka when you need event replay, multiple independent consumers reading the same data, very high throughput, or stream processing. Use RabbitMQ when you need complex routing patterns, message priority queues, per-message acknowledgment with redelivery, or lower latency for small workloads.

What is KRaft mode and why should I use it instead of ZooKeeper?

KRaft (Kafka Raft) mode replaces ZooKeeper as the metadata management layer in Kafka. Instead of running a separate ZooKeeper ensemble, Kafka brokers manage their own metadata using the Raft consensus protocol. KRaft eliminates the operational overhead of running and monitoring ZooKeeper, reduces deployment complexity, speeds up topic creation and partition leadership changes, and removes the metadata bottleneck. KRaft has been production-ready since Kafka 3.3 and ZooKeeper support was removed entirely in Kafka 4.0.

How many partitions should I create for a Kafka topic?

The number of partitions determines the maximum parallelism for consumers in a consumer group, since each partition is consumed by exactly one consumer in the group. Start with the number of consumers you plan to run concurrently. For most use cases, 6 to 12 partitions per topic is a good starting point. Too few partitions limits parallelism; too many increases metadata overhead, end-to-end latency, and recovery time after broker failures. You can increase partitions later, but this breaks key-based ordering guarantees.

What does exactly-once semantics mean in Kafka?

Exactly-once semantics (EOS) ensures that even if a producer retries sending a message due to a transient failure, the message is written to the topic exactly once and consumed exactly once. Kafka achieves this through idempotent producers (which deduplicate retries using a producer ID and sequence number) and transactions (which atomically write to multiple partitions and commit consumer offsets). Enable it by setting enable.idempotence=true and using transactions for consume-transform-produce patterns.

How do I monitor consumer lag in Kafka?

Consumer lag is the difference between the latest offset in a partition and the last committed offset of a consumer group. Monitor it using kafka-consumer-groups.sh --describe, which shows the current offset, log-end offset, and lag for each partition. For production monitoring, expose Kafka JMX metrics to Prometheus using the JMX Exporter agent, then track records-lag-max on the consumer side. Set alerts when lag exceeds a threshold that indicates your consumers cannot keep up with producers.

Conclusion

Apache Kafka is the standard platform for event streaming at scale. Start with a single-broker KRaft setup in Docker for development, learn the producer-consumer fundamentals, then layer on Kafka Connect for integrations, Schema Registry for data governance, and Kafka Streams for processing. In production, focus on proper replication, partition sizing, monitoring consumer lag, and keeping brokers healthy with dedicated disks and adequate memory for the OS page cache.

Learn More

Docker Compose: The Complete Guide — deploy Kafka clusters with Docker Compose
Redis Complete Guide — compare Kafka with Redis Streams for lightweight messaging
Prometheus & Grafana Monitoring Guide — monitor Kafka JMX metrics with Prometheus
JSON Formatter — validate and format Kafka Connect connector configurations