Formally Exploring Time-Series Anomaly Detection Evaluation Metrics
Dennis Wagner, Arjun Nair, Billy Joe Franks, Justus Arweiler, Aparna Muraleedharan, Indra Jungjohann, Fabian Hartung, Mayank C. Ahuja, Andriy Balinskyy, Saurabh Varshneya, Nabeel Hussain Syed, Mayank Nagda, Phillip Liznerski, Steffen Reithermann, Maja Rudolph, Sebastian Vollmer

TL;DR
This paper introduces a formal framework for evaluating time-series anomaly detection metrics, revealing limitations of existing metrics and proposing new, more reliable ones that satisfy essential evaluation properties.
Contribution
The paper formalizes key properties for anomaly detection metrics, analyzes existing metrics against these properties, and proposes LARM and ALARM as new metrics that meet all criteria.
Findings
Most existing metrics satisfy only a few evaluation properties.
None of the current metrics satisfy all formal properties.
LARM and ALARM are proposed as new metrics satisfying all properties.
Abstract
Undetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Software System Performance and Reliability
