Data-Efficient and Interpretable Tabular Anomaly Detection
Chun-Hao Chang, Jinsung Yoon, Sercan Arik, Madeleine Udell, Tomas, Pfister

TL;DR
This paper introduces DIAD, a novel, interpretable, and data-efficient anomaly detection framework for tabular data that effectively incorporates small amounts of labeled data and provides explanations for detected anomalies.
Contribution
The paper proposes DIAD, a white-box model based on Generalized Additive Models, for anomaly detection that is both interpretable and capable of semi-supervised learning with minimal labeled data.
Findings
DIAD outperforms previous methods in unsupervised and semi-supervised settings.
Incorporating 5 labeled anomalies improves AUC from 86.2% to 89.4%.
DIAD provides meaningful interpretations of anomalies.
Abstract
Anomaly detection (AD) plays an important role in numerous applications. We focus on two understudied aspects of AD that are critical for integration into real-world applications. First, most AD methods cannot incorporate labeled data that are often available in practice in small quantities and can be crucial to achieve high AD accuracy. Second, most AD methods are not interpretable, a bottleneck that prevents stakeholders from understanding the reason behind the anomalies. In this paper, we propose a novel AD framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies using a partial identification objective which naturally handles noisy or heterogeneous features. In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Data-Driven Disease Surveillance
