Query-Guided Analysis and Mitigation of Data Verification Errors (Extended Version)
Ran Schreiber, Yael Amsterdamer

TL;DR
This paper introduces a framework that assesses and mitigates the impact of labeling errors in data verification on query results, using a new uncertainty metric and algorithms to guide additional verification for improved data quality.
Contribution
It proposes Maximal Error Score (MES), a novel uncertainty metric, and algorithms for identifying risky tuples and guiding verification, enhancing data verification processes.
Findings
MESReduce significantly reduces MES in experiments.
The framework improves the accuracy of verification results.
Algorithms effectively identify tuples where verification is most beneficial.
Abstract
Data verification, the process of labeling data items as correct or incorrect, is a preprocessing step that may critically affect the quality of results in data-driven pipelines. Despite recent advances, verification can still produce erroneous labels that propagate to downstream query results in complex ways. We present a framework that complements existing verification tools by assessing the impact of potential labeling errors on query outputs and guiding additional verification steps to improve result reliability. To this end, we introduce Maximal Error Score (MES), a worst-case uncertainty metric that quantifies the reliability of query output tuples independently of the underlying data distribution. As an auxiliary indicator, we identify risky tuples - input tuples for which reducing label uncertainty may counterintuitively increase the output uncertainty. We then develop efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Management and Algorithms
