Nonparametric Detection of Anomalous Data Streams
Shaofeng Zou, Yingbin Liang, H. Vincent Poor, Xinghua Shi

TL;DR
This paper develops a nonparametric method using maximum mean discrepancy for detecting anomalous data streams without prior knowledge of distributions, establishing conditions for exponential consistency and optimality.
Contribution
It introduces a distribution-free test for anomaly detection in data streams that is proven to be order-level optimal under certain conditions.
Findings
The proposed test is exponentially consistent when sample size exceeds a constant times log n.
The test performs better or comparable to existing methods in numerical experiments.
Optimality bounds are established for the sample size needed for reliable detection.
Abstract
A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n sequences with s anomalous sequences to be detected. Each typical sequence contains m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence contains m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed using maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences and the number n of sequences. It is then shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of log n,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Statistical Methods and Inference · Machine Learning and Algorithms
