Evaluating Anomaly Detectors for Simulated Highly Imbalanced Industrial Classification Problems
Lesley Wheat, Martin v. Mohrenschildt, Saeid Habibi

TL;DR
This study evaluates various anomaly detection algorithms on simulated industrial datasets with extreme class imbalance, revealing how their performance varies with data size, feature count, and available faulty examples.
Contribution
It provides a comprehensive benchmark of 14 anomaly detectors on synthetic industrial data, highlighting the impact of data quantity and features on detection performance.
Findings
Unsupervised methods excel with fewer faulty examples.
Semi-supervised and supervised methods improve with more faulty data.
Generalization performance drops on smaller datasets.
Abstract
Machine learning offers potential solutions to current issues in industrial systems in areas such as quality control and predictive maintenance, but also faces unique barriers in industrial applications. An ongoing challenge is extreme class imbalance, primarily due to the limited availability of faulty data during training. This paper presents a comprehensive evaluation of anomaly detection algorithms using a problem-agnostic simulated dataset that reflects real-world engineering constraints. Using a synthetic dataset with a hyper-spherical based anomaly distribution in 2D and 10D, we benchmark 14 detectors across training datasets with anomaly rates between 0.05% and 20% and training sizes between 1 000 and 10 000 (with a testing dataset size of 40 000) to assess performance and generalization error. Our findings reveal that the best detector is highly dependant on the total number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
