Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection
Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

TL;DR
This paper evaluates the effectiveness of nearest-neighbor methods for anomaly detection through simulations, real data analysis, and theoretical guarantees, demonstrating their competitive performance and providing insights into their behavior across different scenarios.
Contribution
It offers a comprehensive analysis of NN-based anomaly detection, including empirical comparisons, performance on real datasets, and theoretical guarantees using the distance-to-measure framework.
Findings
NN methods perform well compared to state-of-the-art algorithms.
Performance varies with data dimensionality.
Finite-sample guarantees for the empirical DTM are established.
Abstract
Nearest-neighbor (NN) procedures are well studied and widely used in both supervised and unsupervised learning problems. In this paper we are concerned with investigating the performance of NN-based methods for anomaly detection. We first show through extensive simulations that NN methods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a set of benchmark synthetic datasets. We further consider the performance of NN methods on real datasets, and relate it to the dimensionality of the problem. Next, we analyze the theoretical properties of NN-methods for anomaly detection by studying a more general quantity called distance-to-measure (DTM), originally developed in the literature on robust geometric and topological inference. We provide finite-sample uniform guarantees for the empirical DTM and use them to derive misclassification rates for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Advanced Statistical Methods and Models
