M2: An Analytic System with Specialized Storage Engines for Multi-Model Workloads
Kyoseung Koo, Bogyeong Kim, Bongki Moon

TL;DR
M2 is a multi-model analytic system with integrated storage engines and a novel multi-stage hash join algorithm, significantly improving performance in multi-model workloads compared to existing methods.
Contribution
M2 introduces a unified system with specialized storage engines and a new inter-model join algorithm for efficient multi-model data analytics.
Findings
Up to 188x speedup over existing approaches
Effective handling of multiple data models within a single system
Introduction of multi-stage hash join for cross-model data integration
Abstract
Modern data analytic workloads increasingly require handling multiple data models simultaneously. Two primary approaches meet this need: polyglot persistence and multi-model database systems. Polyglot persistence employs a coordinator program to manage several independent database systems but suffers from high communication costs due to its physically disaggregated architecture. Meanwhile, existing multi-model database systems rely on a single storage engine optimized for a specific data model, resulting in inefficient processing across diverse data models. To address these limitations, we present M2, a multi-model analytic system with integrated storage engines. M2 treats all data models as first-class entities, composing query plans that incorporate operations across models. To effectively combine data from different models, the system introduces a specialized inter-model join…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Management and Algorithms
