Query By Provenance

Daniel Deutch; Amir Gilad

arXiv:1602.03819·cs.DB·May 17, 2016

Query By Provenance

Daniel Deutch, Amir Gilad

PDF

Open Access

TL;DR

This paper introduces a new framework for query formulation that leverages user explanations of examples, improving the accuracy and focus of inferred queries through data provenance models and an intuitive interface.

Contribution

It presents a novel approach combining explanations and data provenance to infer more accurate conjunctive queries, with proven efficiency and a practical system prototype.

Findings

01

Enhanced query accuracy with user explanations

02

Efficient algorithms with proven computational properties

03

Positive user study and benchmark results

Abstract

To assist non-specialists in formulating database queries, multiple frameworks that automatically infer queries from a set of examples have been proposed. While highly useful, a shortcoming of the approach is that if users can only provide a small set of examples, many inherently different queries may qualify, and only some of these actually match the user intentions. Our main observation is that if users further explain their examples, the set of qualifying queries may be significantly more focused. We develop a novel framework where users explain example tuples by choosing input tuples that are intuitively the "cause" for their examples. Their explanations are automatically "compiled" into a formal model for explanations, based on previously developed models of data provenance. Then, our novel algorithms infer conjunctive queries from the examples and their explanations. We prove the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Data Quality and Management