ATD: Anomalous Topic Discovery in High Dimensional Discrete Data
Hossein Soleimani, David J. Miller

TL;DR
This paper introduces ATD, a novel algorithm for detecting anomalous clusters and their shared features in high-dimensional discrete data, especially effective when anomalies are confined to small feature subsets.
Contribution
The paper presents a new method for group anomaly detection in high-dimensional discrete data, focusing on identifying anomalous topics and salient features using topic models.
Findings
Accurately detects anomalous topics in synthetic and real text data.
Outperforms standard group and individual anomaly detection methods.
Identifies salient features associated with anomalies.
Abstract
We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
