The Case for Learned In-Memory Joins

Ibrahim Sabek; Tim Kraska

arXiv:2111.08824·cs.DB·March 10, 2022

The Case for Learned In-Memory Joins

Ibrahim Sabek, Tim Kraska

PDF

Open Access

TL;DR

This paper explores the practicality of using CDF-based learned models to enhance in-memory join performance across different join types, demonstrating consistent improvements over existing methods.

Contribution

It is the first study to evaluate CDF-based learned models for in-memory join optimization, proposing variants that outperform current techniques.

Findings

01

Learned variants improve INLJ and SJ performance

02

Proposed models outperform state-of-the-art methods

03

Demonstrates practical benefits of learned models in joins

Abstract

In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is practical or not. To the best of our knowledge, we are the first to fill this gap. We investigate the usage of CDF-based partitioning and learned indexes (e.g., Recursive Model Indexes (RMI) and RadixSpline) in the three join categories; indexed nested loop join (INLJ), sort-based joins (SJ) and hash-based joins (HJ). Our study shows that there is a room to improve the performance of INLJ and SJ categories through our proposed optimized learned variants. Our experimental analysis showed that these proposed learned variants of INLJ and SJ consistently outperform the state-of-the-art techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Stream Mining Techniques