Detecting Errors in a Numerical Response via any Regression Model
Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei

TL;DR
This paper introduces a method for detecting errors in numerical responses within regression datasets by using veracity scores, providing theoretical guarantees and a new benchmark for evaluation.
Contribution
It proposes a novel error detection approach with veracity scores, along with theoretical analysis and a new benchmark dataset for real-world numerical errors.
Findings
Outperforms existing methods in error detection precision and recall.
Provides theoretical guarantees for the filtering procedure.
Introduces a new benchmark dataset with real-world numerical errors.
Abstract
Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider general regression settings with covariates and a potentially corrupted response whose observed values may contain errors. By accounting for various uncertainties, we introduced veracity scores that distinguish between genuine errors and natural data fluctuations, conditioned on the available covariate information in the dataset. We propose a simple yet efficient filtering procedure for eliminating potential errors, and establish theoretical guarantees for our method. We also contribute a new error detection benchmark involving 5 regression datasets with real-world numerical errors (for which the true values are also known). In this benchmark and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Advanced Statistical Methods and Models
Methodsfail
