Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms
Dimitrios Koutsoukos, Ingo M\"uller, Renato Marroqu\'in, Ana, Klimovic, Gustavo Alonso

TL;DR
Modularis introduces a modular execution layer for data analytics that uses sub-operators, enabling flexible, efficient, and portable query processing across diverse hardware platforms with minimal code changes.
Contribution
The paper presents Modularis, a novel sub-operator based architecture that simplifies porting and extending distributed query systems across heterogeneous hardware.
Findings
Modularis outperforms Presto, SingleStore, Athena, and BigQuery in end-to-end performance.
Minimal code changes are needed to adapt Modularis to different hardware platforms.
Modularis is easily extensible to various join types and group-by queries.
Abstract
The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e.,composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cognitive Computing and Networks · Distributed systems and fault tolerance
