Mosaic: A Sample-Based Database System for Open World Query Processing
Laurel Orr, Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda, Balazinska, Dan Suciu

TL;DR
Mosaic is a novel database system designed to enable accurate population queries directly over biased sample data by treating samples as first-class citizens and extending SQL with new techniques.
Contribution
It introduces a sample-based data model and SQL extensions to facilitate population query answering from biased samples with unknown sampling probabilities.
Findings
Preliminary results demonstrate the feasibility of the proposed query answering techniques.
The system effectively handles biased samples without prior knowledge of sampling probabilities.
Abstract
Data scientists have relied on samples to analyze populations of interest for decades. Recently, with the increase in the number of public data repositories, sample data has become easier to access. It has not, however, become easier to analyze. This sample data is arbitrarily biased with an unknown sampling probability, meaning data scientists must manually debias the sample with custom techniques to avoid inaccurate results. In this vision paper, we propose Mosaic, a database system that treats samples as first-class citizens and allows users to ask questions over populations represented by these samples. Answering queries over biased samples is non-trivial as there is no existing, standard technique to answer population queries when the sampling probability is unknown. In this paper, we show how our envisioned system solves this problem by having a unique sample-based data model with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Quality and Management
