Evaluating Learned Indexes for External-Memory Joins
Yuvaraj Chesetti, Prashant Pandey

TL;DR
This paper evaluates the effectiveness of learned indexes in external-memory join operations, comparing their performance and space efficiency against traditional methods across various datasets and conditions.
Contribution
It provides a comprehensive analysis of learned index-based joins for external-memory scenarios, highlighting their trade-offs and performance relative to traditional join algorithms.
Findings
Learned indexes can trade accuracy for space without significant performance loss.
They produce smaller indexes but have similar I/O costs as B-trees in external-memory joins.
Construction times for learned indexes are about 1000 times longer than traditional indexes.
Abstract
Joins are among the most time-consuming and data-intensive operations in relational query processing. Much research effort has been applied to the optimization of join processing due to their frequent execution. Recent studies have shown that CDF-based learned models can create smaller and faster indexes, accelerating in-memory joins. However, their effectiveness for external-memory joins, which are crucial for large-scale databases, remains underexplored. This paper evaluates the impact of learned indexes on external-memory joins for both sorted and unsorted data. We compare learned index-based joins against traditional join methods such as hash joins, sort joins, and indexed nested-loop joins on real-world and simulated datasets. Additionally, we analyze learned index-based joins across multiple dimensions, including storage device types, data sorting, parallelism, constrained memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
