One Join Order Does Not Fit All: Reducing Intermediate Results with Per-Split Query Plans

Yujun He; Hangdong Zhao; Simon Frisk; Yifei Yang; Kevin Kristensen; Paraschos Koutris; Xiangyao Yu

arXiv:2510.25684·cs.DB·October 30, 2025

One Join Order Does Not Fit All: Reducing Intermediate Results with Per-Split Query Plans

Yujun He, Hangdong Zhao, Simon Frisk, Yifei Yang, Kevin Kristensen, Paraschos Koutris, Xiangyao Yu

PDF

TL;DR

SplitJoin introduces a novel framework that partitions data into heavy and light parts, enabling tailored query plans to significantly reduce intermediate results and improve multi-join query efficiency.

Contribution

The paper proposes SplitJoin, a new framework with split as a first-class operator, allowing different data partitions to use distinct plans for better efficiency.

Findings

01

Achieves 2.1x faster runtime on DuckDB

02

Reduces intermediate sizes by up to 74x

03

Completes more queries than baseline systems

Abstract

Minimizing intermediate results is critical for efficient multi-join query processing. Although the seminal Yannakakis algorithm offers strong guarantees for acyclic queries, cyclic queries remain an open challenge. In this paper, we propose SplitJoin, a framework that introduces split as a first-class query operator. By partitioning input tables into heavy and light parts, SplitJoin allows different data partitions to use distinct query plans, with the goal of reducing intermediate sizes using existing binary join engines. We systematically explore the design space for split-based optimizations, including threshold selection, split strategies, and join ordering after splits. Implemented as a front-end to DuckDB and Umbra, SplitJoin achieves substantial improvements: on DuckDB, SplitJoin completes 43 social network queries (vs. 29 natively), achieving 2.1x faster runtime and 7.9x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.