Repairing Systematic Outliers by Learning Clean Subspaces in VAEs
Simao Eduardo, Kai Xu, Alfredo Nazabal, Charles Sutton

TL;DR
This paper introduces CLSVAE, a semi-supervised VAE model that effectively detects and repairs systematic outliers in data by modeling inliers and outliers separately in a partitioned latent space, requiring minimal labeled data.
Contribution
The paper presents CLSVAE, a novel semi-supervised approach that models inlier and outlier patterns in separate latent subspaces for improved systematic outlier detection and repair.
Findings
CLSVAE outperforms baselines in image data repairs.
Requires less than 2% labeled data for effective detection.
Achieves 58% relative error reduction with 0.25% labeled data.
Abstract
Data cleaning often comprises outlier detection and data repair. Systematic errors result from nearly deterministic transformations that occur repeatedly in the data, e.g. specific image pixels being set to default values or watermarks. Consequently, models with enough capacity easily overfit to these errors, making detection and repair difficult. Seeing as a systematic outlier is a combination of patterns of a clean instance and systematic error patterns, our main insight is that inliers can be modelled by a smaller representation (subspace) in a model than outliers. By exploiting this, we propose Clean Subspace Variational Autoencoder (CLSVAE), a novel semi-supervised model for detection and automated repair of systematic errors. The main idea is to partition the latent space and model inlier and outlier patterns separately. CLSVAE is effective with much less labelled data compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
MethodsRepair
