A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection
Kaixiang Yang, Jiarong Liu, Yupeng Song, Shuanghua Yang, Yujue Zhou

TL;DR
This paper proposes a problem-oriented framework for evaluating time series anomaly detection metrics, categorizing over twenty metrics into six dimensions to improve understanding and selection based on application needs.
Contribution
It introduces a unified taxonomy that reinterprets existing metrics based on evaluation challenges, and provides experimental insights into their discriminative abilities and robustness.
Findings
Most event-level metrics show strong separability between true and random detections.
Some widely used metrics like NAB and Point-Adjust are vulnerable to random-score inflation.
The suitability of metrics depends on specific operational objectives and application contexts.
Abstract
Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumptions. This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address, rather than their mathematical forms or output structures. We categorize over twenty commonly used metrics into six dimensions: 1) basic accuracy-driven evaluation; 2) timeliness-aware reward mechanisms; 3) tolerance to labeling imprecision; 4) penalties reflecting human-audit cost; 5) robustness against random or inflated scores; and 6) parameter-free comparability for cross-dataset benchmarking. Comprehensive experiments are conducted to examine metric behavior under genuine, random, and oracle detection scenarios. By comparing their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
