Kafka

Kafka is a distributed message queue capable of spreading data and workload across multiple servers(brokers). This enables high availability, fault tolerance, and scalability. Kafka brokers collectively form cluster, and they coordinate with each other to distribute messages.

Kafka messages are published to topics which are then persisted on disk. This enables a broker to withstand system failures. This is essential for ensuring that no data is lost. Kafka can be configured to provide the level of durability you need via replication factors and acknowledgement settings.

Kafka is designed to be horizontally scalable. You can add more machines to the cluster to increase its ability to handle greater volumes of traffic.

Kafka is designed for high throughput and low latency. Its architecture allows for the fast processing of large streams of real-time data. Read/write throughput can be parallelised by dividing topics into partitions and having partitions distributed acros different brokers.

In Kafka, a message is the fundamental unit of data. It is essentially an immutable array of bytes. Once a message is written to a Kafka topic, it cannot be changed or deleted

A topic is a like a category for which messages can be sent. Topics enable messages to be organised and help consumers subscribe they find relevant.

A producer is a process that writes messages to a topic. producers can also decide which partition within a topic to send a message to.

A consumer is a process that subscribes to topics and reads messages from them. Consumers are typically part of a consumer group, which allows for message processing to be load-balanced across multiple consumers.

Brokers are servers that store data and serve clients. Multiple brokers make up a cluster. The broker handles incoming messages from producers, stores them, and serves them to consumers when requested.

Topics are divided into partitions to allow for greater parallelism and scalability. Partitions may be hosted on different servers for fault tolerance and load balancing.

Producers determine which partition a message will be sent to. This is done using a partitioning algorithm. Additionally, a message key can be used to ensure that messages with the same key go to the same partition.

Brokers can fail - to ensure high availability and durability, Kafka replicates each partition across multiple brokers

Leaders - One Leader, N-1 Followers

For each partition, there's a leader broker with N-1 followers. The leader handles all reads + writes while the followers replicate the data.

for a partition, clients interact through the leader only. This simplifies the system architecture and improves performance.

When a leader becomes unreachable, a new leader is elected