Kafka load balancing with Akka


At leveris, we use modern technologies with a distributed nature. Our applications are designed to scale easily without the need to rewrite or even make a change in the architecture. We use scalable technologies such as Akka and Kafka.

Akka is a tool for largely scalable and distributed applications. It has an easy-to-work-with concurrency model using actors. The developer does not work with threads and writes highly concurrent apps. In actuality, this means that multiple threads work on a single machine; there are more than three machines for production deployment.

Apache Kafka (or Kafka for short) has a similar distributed nature as Akka. It manages topics. Topics are split into multiple partitions stored on multiple servers with multiple copies of data for failover. Kafka has connected so-called producers and consumers to work with topics. Producers store data in topics and consumers receive the data at the other end of the topics. Both producers and consumers could be other enterprise apps that work on multiple servers with multiple threads, as Akka apps do.

In the end, the configuration from the application in Akka has connection to a several topics. There must be as many threads as there are partitions in each connected topic. If a producer sends a message to a Kafka topic, it is stored in one of the topic partitions. We have to make sure that the message is delivered to the consumer just once, no more no less. In actuality, the message should not become lost under any circumstances and can be delivered multiple times in the event of infrastructure problems. Each message has a unique identification to prevent it from being processed multiple times. The distributed nature of Akka and Kafka provides us with the possibility to implement no data loss architecture. This is what we need.

The question is how to deliver each message just once and to every consumer of a topic. For reasons of high availability, we want all members of the Akka cluster (or at least more than one of them) to have a connection to the Kafka topic. If any Akka node should go down, another node will take over the job and process the messages. This is also good for load balancing. But how do we coordinate the delivery of the messages if there are multiple connected Akka nodes for one topic? This is where the high level Kafka client driver comes in. This is a driver that manages consumer positions in each topic and topic partition. It stores the position in secure storage even if a consumer disconnects and reconnects. There are numerous options to set for the high level client, including the so-called group.id.

This group.id attribute uniquely identifies the consumer. The Kafka client delivers each message just once to each application identified by the group.id. If there are multiple consumers with the same group.id, the messages are balanced among all connected clients. For us, this means that all the nodes of one Akka application connect to the Kafka topic with the same group.id. Kafka balances the messages and we can freely restart Akka nodes without the threat of message loss or significant performance impact. We at leveris thank the Kafka developers for making our life easier!

Dive deeper into