QFix: Diagnosing errors through query histories
Xiaolan Wang, Alexandra Meliou, Eugene Wu

TL;DR
QFix is a framework that diagnoses and repairs data errors by analyzing query histories, improving data correctness in applications where errors are introduced and propagated through updates.
Contribution
It formalizes the diagnosis problem based on query logs, develops exact and optimized methods for error detection and fixing, and scales to large datasets with a performance-accuracy tradeoff.
Findings
Effective in identifying data errors caused by query mistakes
Scales to large datasets with near-optimal accuracy
Demonstrated success on benchmark and synthetic data
Abstract
Data-driven applications rely on the correctness of their data to function properly and effectively. Errors in data can be incredibly costly and disruptive, leading to loss of revenue, incorrect conclusions, and misguided policy decisions. While data cleaning tools can purge datasets of many errors before the data is used, applications and users interacting with the data can introduce new errors. Subsequent valid updates can obscure these errors and propagate them through the dataset causing more discrepancies. Even when some of these discrepancies are discovered, they are often corrected superficially, on a case-by-case basis, further obscuring the true underlying cause, and making detection of the remaining errors harder. In this paper, we propose QFix, a framework that derives explanations and repairs for discrepancies in relational data, by analyzing the effect of queries that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Software System Performance and Reliability · Cloud Computing and Resource Management
