Theoretical Analysis of Learned Database Operations under Distribution   Shift through Distribution Learnability

Sepanta Zeighami; Cyrus Shahahbi

arXiv:2411.06241·cs.LG·November 12, 2024

Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability

Sepanta Zeighami, Cyrus Shahahbi

PDF

Open Access

TL;DR

This paper provides the first theoretical analysis of learned database operations under distribution shifts, establishing bounds and conditions for their performance advantages over traditional methods.

Contribution

It introduces the distribution learnability framework and offers the first theoretical bounds on learned database operations in dynamic datasets.

Findings

01

Learned models can outperform non-learned methods under certain distribution conditions.

02

Theoretical bounds explain when and why learned models are advantageous.

03

Framework develops foundational tools for analyzing learned database operations.

Abstract

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Neural Networks and Applications · Machine Learning and Algorithms