DeepDB: Learn from Data, not from Queries!
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina,, Kristian Kersting, Carsten Binnig

TL;DR
DeepDB introduces a data-driven learning approach for database components that avoids costly workload-specific training, enabling better accuracy and adaptability to data and workload changes.
Contribution
The paper presents a novel data-driven model for database tasks that generalizes well and eliminates the need for workload-specific training.
Findings
Outperforms state-of-the-art workload-driven models in accuracy.
Supports ad-hoc queries and data updates without full retraining.
Generalizes better to unseen queries.
Abstract
The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload and the data changes. To overcome these limitations, we take a different route: we propose to learn a pure data-driven model that can be used for different tasks such as query answering or cardinality estimation. This data-driven model also supports ad-hoc queries and updates of the data without the need of full retraining when the workload or data changes. Indeed, one may now expect that this comes at a price of lower accuracy since workload-driven models can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Data Management and Algorithms
