We Need to Rethink Benchmarking in Anomaly Detection
Philipp R\"ochner, Simon Kl\"uttermann, Franz Rothlauf, Daniel Schl\"or

TL;DR
This paper argues that current benchmarking practices in anomaly detection are inadequate and proposes a rethinking approach that emphasizes scenario diversity, end-to-end analysis, and meaningful evaluation aligned with application objectives.
Contribution
It introduces a new perspective on benchmarking by advocating for scenario-based evaluation, comprehensive pipeline analysis, and scenario-specific metrics in anomaly detection.
Findings
Current benchmarks show minimal performance differences.
Benchmarking does not reflect application-specific anomaly diversity.
Proposed improvements include scenario taxonomy and end-to-end evaluation.
Abstract
Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. Current benchmarking does not, for example, sufficiently reflect the diversity of anomalies in applications ranging from predictive maintenance to scientific discovery. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that capture the relevant characteristics of different applications. We identify three key areas for improvement: First, we need to identify anomaly detection scenarios based on a common taxonomy. Second, anomaly detection pipelines should be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
