Data Imputation through the Identification of Local Anomalies
Huseyin Ozkan, Ozgun S. Pelvan, Suleyman S. Kozat

TL;DR
This paper presents a statistical, model-free framework for detecting and imputing localized data corruptions, introducing novel algorithms and distance measures that improve corruption separation and data imputation in noisy datasets.
Contribution
The paper introduces a new algorithm for detecting and localizing corruptions and a MAP estimator for data imputation, along with a novel ranked deviation distance measure, all within a comprehensive framework.
Findings
Effective corruption detection and localization demonstrated on multiple datasets.
Significant improvement in classification accuracy with corrupted data.
Robust performance across different training conditions.
Abstract
We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Water Systems and Optimization
