The Exception that Improves the Rule
Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller

TL;DR
This paper introduces Vizier, a system that combines spreadsheets, notebooks, and relational databases to facilitate data curation by enabling easy singleton operations and manual transformations within a relational framework.
Contribution
It proposes a hybrid environment that allows manual, singleton data operations in relational databases, addressing a key limitation of set-based query processing.
Findings
Identifies the challenge of singleton operations in relational databases.
Proposes a hybrid spreadsheet/relational environment for data curation.
Presents the Vizier system as a solution for flexible data transformations.
Abstract
The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperative languages, and notebook style programming environments like Jupyter for data curation. In this work, we explore the integration of spreadsheets, notebooks, and relational databases. We focus on a key advantage that both spreadsheets and imperative notebook environments have over classical relational databases: ease of exception. By relying on set-at-a-time operations, relational databases sacrifice the ability to easily define singleton operations, exceptions to a normal data processing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Advanced Database Systems and Queries
