One Join Order Does Not Fit All: Reducing Intermediate Results with Per-Split Query Plans
Yujun He, Hangdong Zhao, Simon Frisk, Yifei Yang, Kevin Kristensen, Paraschos Koutris, Xiangyao Yu

TL;DR
SplitJoin introduces a novel framework that partitions data into heavy and light parts, enabling tailored query plans to significantly reduce intermediate results and improve multi-join query efficiency.
Contribution
The paper proposes SplitJoin, a new framework with split as a first-class operator, allowing different data partitions to use distinct plans for better efficiency.
Findings
Achieves 2.1x faster runtime on DuckDB
Reduces intermediate sizes by up to 74x
Completes more queries than baseline systems
Abstract
Minimizing intermediate results is critical for efficient multi-join query processing. Although the seminal Yannakakis algorithm offers strong guarantees for acyclic queries, cyclic queries remain an open challenge. In this paper, we propose SplitJoin, a framework that introduces split as a first-class query operator. By partitioning input tables into heavy and light parts, SplitJoin allows different data partitions to use distinct query plans, with the goal of reducing intermediate sizes using existing binary join engines. We systematically explore the design space for split-based optimizations, including threshold selection, split strategies, and join ordering after splits. Implemented as a front-end to DuckDB and Umbra, SplitJoin achieves substantial improvements: on DuckDB, SplitJoin completes 43 social network queries (vs. 29 natively), achieving 2.1x faster runtime and 7.9x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
