Statistical Evaluation of Anomaly Detectors for Sequences
Erik Scharw\"achter, Emmanuel M\"uller

TL;DR
This paper formalizes and evaluates statistical measures of precision and recall for sequential anomaly detection, highlighting potential overestimations and proposing methods to assess significance.
Contribution
It introduces a formal notion of time-tolerant precision and recall for sequence anomaly detection and provides statistical tools to evaluate their significance.
Findings
Precision and recall can overestimate detector performance with temporal tolerance.
Null distributions enable significance assessment of detection results.
Simulation studies demonstrate the importance of statistical validation.
Abstract
Although precision and recall are standard performance measures for anomaly detection, their statistical properties in sequential detection settings are poorly understood. In this work, we formalize a notion of precision and recall with temporal tolerance for point-based anomaly detection in sequential data. These measures are based on time-tolerant confusion matrices that may be used to compute time-tolerant variants of many other standard measures. However, care has to be taken to preserve interpretability. We perform a statistical simulation study to demonstrate that precision and recall may overestimate the performance of a detector, when computed with temporal tolerance. To alleviate this problem, we show how to obtain null distributions for the two measures to assess the statistical significance of reported results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Seismology and Earthquake Studies · Time Series Analysis and Forecasting
