Provenance-based Data Skipping (TechReport)

Xing Niu; Ziyu Liu; Pengyuan Li; Boris Glavic

arXiv:2104.12815·cs.DB·May 31, 2021

Provenance-based Data Skipping (TechReport)

Xing Niu, Ziyu Liu, Pengyuan Li, Boris Glavic

PDF

Open Access

TL;DR

This paper introduces provenance-based data skipping (PBDS), a novel method that uses provenance sketches to identify relevant data for queries, significantly enhancing query performance in database systems.

Contribution

The paper presents a new provenance sketch technique that enables data skipping for complex queries, leveraging physical design artifacts to improve database query efficiency.

Findings

01

PBDS significantly reduces query execution time.

02

Effective in both disk-based and main-memory databases.

03

Utilizes physical design artifacts like indexes and zone maps.

Abstract

Database systems analyze queries to determine upfront which data is needed for answering them and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps. Our approach significantly improves performance for both disk-based and main-memory database systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Advanced Data Storage Technologies