TL;DR
This paper critiques current time series anomaly detection benchmarks, revealing flaws that may cause misleading comparisons and progress, and introduces a new, more reliable benchmark archive for the community.
Contribution
The paper identifies four key flaws in existing benchmarks and introduces the UCR Time Series Anomaly Archive to enable more accurate evaluations.
Findings
Existing benchmarks suffer from four major flaws.
Many published algorithm comparisons may be unreliable.
The new UCR Archive aims to improve evaluation reliability.
Abstract
Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Archive. We believe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
