Leapfrog Triejoin: a worst-case optimal join algorithm
Todd L. Veldhuizen

TL;DR
The paper proves that leapfrog triejoin is a worst-case optimal join algorithm, extending its optimality to finer-grained classes of database instances and demonstrating practical implementability with conventional data structures.
Contribution
It establishes the worst-case optimality of leapfrog triejoin, improves upon prior bounds, and shows its practical advantages and simplicity compared to existing algorithms.
Findings
Leapfrog triejoin is worst-case optimal up to a log factor.
It outperforms NPRR on certain classes of database instances.
The algorithm is simple to implement using standard data structures.
Abstract
Recent years have seen exciting developments in join algorithms. In 2008, Atserias, Grohe and Marx (henceforth AGM) proved a tight bound on the maximum result size of a full conjunctive query, given constraints on the input relation sizes. In 2012, Ngo, Porat, R{\'e} and Rudra (henceforth NPRR) devised a join algorithm with worst-case running time proportional to the AGM bound. Our commercial Datalog system LogicBlox employs a novel join algorithm, \emph{leapfrog triejoin}, which compared conspicuously well to the NPRR algorithm in preliminary benchmarks. This spurred us to analyze the complexity of leapfrog triejoin. In this paper we establish that leapfrog triejoin is also worst-case optimal, up to a log factor, in the sense of NPRR. We improve on the results of NPRR by proving that leapfrog triejoin achieves worst-case optimality for finer-grained classes of database instances, such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Algorithms and Data Compression
