A Generic Machine Learning Framework for Fully-Unsupervised Anomaly Detection with Contaminated Data
Markus Ulmer, Jannik Zgraggen, and Lilach Goren Huber

TL;DR
This paper presents a universal unsupervised framework that refines contaminated training data for anomaly detection, improving performance even when normal data is scarce or mixed with anomalies.
Contribution
It introduces a generic, residual-based anomaly detection refinement method that effectively handles contaminated training data, outperforming naive approaches and approaching ideal anomaly-free training.
Findings
Framework improves anomaly detection accuracy on contaminated data
Outperforms naive contaminated data training methods
Often rivals training with clean, anomaly-free data
Abstract
Anomaly detection (AD) tasks have been solved using machine learning algorithms in various domains and applications. The great majority of these algorithms use normal data to train a residual-based model and assign anomaly scores to unseen samples based on their dissimilarity with the learned normal regime. The underlying assumption of these approaches is that anomaly-free data is available for training. This is, however, often not the case in real-world operational settings, where the training data may be contaminated with an unknown fraction of abnormal samples. Training with contaminated data, in turn, inevitably leads to a deteriorated AD performance of the residual-based algorithms. In this paper we introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks. The framework is generic and can be applied to any residual-based machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Data Stream Mining Techniques
