Novelty Detection in Sequential Data by Informed Clustering and Modeling
Linara Adilova, Siming Chen, Michael Kamp

TL;DR
This paper introduces an informed clustering approach for novelty detection in discrete sequences, leveraging domain expertise and LSTM models to improve detection accuracy over traditional methods.
Contribution
The paper presents a novel informed clustering method combined with LSTM modeling that enhances novelty detection in discrete sequences, outperforming existing approaches.
Findings
Informed clustering outperforms automatic clustering.
Decomposition improves detection despite less data per cluster.
Approach outperforms state-of-the-art methods in real-world scenarios.
Abstract
Novelty detection in discrete sequences is a challenging task, since deviations from the process generating the normal data are often small or intentionally hidden. Novelties can be detected by modeling normal sequences and measuring the deviations of a new sequence from the model predictions. However, in many applications data is generated by several distinct processes so that models trained on all the data tend to over-generalize and novelties remain undetected. We propose to approach this challenge through decomposition: by clustering the data we break down the problem, obtaining simpler modeling task in each cluster which can be modeled more accurately. However, this comes at a trade-off, since the amount of training data per cluster is reduced. This is a particular problem for discrete sequences where state-of-the-art models are data-hungry. The success of this approach thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Data Visualization and Analytics
