TL;DR
AIR is a high-performance, lightweight dataflow engine built with C++ and MPI that achieves significantly lower latency and higher throughput than Spark and Flink by using asynchronous communication and avoiding master node bottlenecks.
Contribution
The paper introduces AIR, a novel dataflow engine designed from scratch with MPI and pthreads, implementing asynchronous routing to improve performance and scalability over existing systems.
Findings
AIR outperforms Spark and Flink by up to 15 times in latency and throughput.
AIR scales more effectively on clusters up to 8 nodes and 224 cores.
The architecture reduces control flow overhead by eliminating the master node.
Abstract
Distributed Stream Processing Systems (DSPSs) are among the currently most emerging topics in data management, with applications ranging from real-time event monitoring to processing complex dataflow programs and big data analytics. The major market players in this domain are clearly represented by Apache Spark and Flink, which provide a variety of frontend APIs for SQL, statistical inference, machine learning, stream processing, and many others. Yet rather few details are reported on the integration of these engines into the underlying High-Performance Computing (HPC) infrastructure and the communication protocols they use. Spark and Flink, for example, are implemented in Java and still rely on a dedicated master node for managing their control flow among the worker nodes in a compute cluster. In this paper, we describe the architecture of our AIR engine, which is designed from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
