Provenance-based Data Skipping (TechReport)
Xing Niu, Ziyu Liu, Pengyuan Li, Boris Glavic

TL;DR
This paper introduces provenance-based data skipping (PBDS), a novel method that uses provenance sketches to identify relevant data for queries, significantly enhancing query performance in database systems.
Contribution
The paper presents a new provenance sketch technique that enables data skipping for complex queries, leveraging physical design artifacts to improve database query efficiency.
Findings
PBDS significantly reduces query execution time.
Effective in both disk-based and main-memory databases.
Utilizes physical design artifacts like indexes and zone maps.
Abstract
Database systems analyze queries to determine upfront which data is needed for answering them and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps. Our approach significantly improves performance for both disk-based and main-memory database systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Advanced Data Storage Technologies
