On Efficiently Partitioning a Topic in Apache Kafka
Theofanis P. Raptis, Andrea Passarella

TL;DR
This paper models Kafka topic partitioning, formulates it as an intractable optimization problem, and proposes heuristics that outperform existing recommendations in resource efficiency and constraint satisfaction.
Contribution
It introduces a formal model and two heuristics for Kafka partitioning, addressing the open problem of efficient topic distribution under multiple constraints.
Findings
Heuristics respect replication latency constraints.
Heuristics reduce unavailability time.
Heuristics utilize system resources more efficiently.
Abstract
Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
