Member-only story
Multi-threaded Apache Kafka consumers using confluent-kafka-python with ThreadPoolExecutor
After completion of the batch polling Kafka consumers, it was happily deployed to production and the performance of most consumers were pretty awesome but not for the ones with higher throughput.
The consumer group lag increases dramatically, it could have a million lag as the consumers failed to keep up. The only thing that we can do at this point is to increase the partition size of the topic and at the same time increases the number of running consumers to match the number of partitions, but to what extend?
Messages that were in an old partition wouldn’t be “load-balanced” into a newer partition, as a result consumer lag would not be resolved any time soon. Only new messages that were assigned into the new partitions would be consumed sooner as there are not much message in the new partitions.
Multi-threaded Kafka Consumer
Start a thread pool with max_workers
equivalent to the number of partitions your topic has. What happens now is assuming you have 10 partitions of a topic, each partition will be submitted to its own thread and it will be processed concurrently.
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor: