Expected Similarity Estimation for Large-Scale Batch and Streaming   Anomaly Detection

Markus Schneider; Wolfgang Ertel; Fabio Ramos

arXiv:1601.06602·cs.LG·June 7, 2016

Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection

Markus Schneider, Wolfgang Ertel, Fabio Ramos

PDF

1 Repo

TL;DR

This paper introduces EXPoSE, a kernel-based anomaly detection algorithm capable of efficiently handling large-scale datasets and data streams with constant time and memory requirements, adaptable to concept drift.

Contribution

The paper presents EXPoSE, a novel, efficient, kernel-based method for anomaly detection that operates in linear time offline and constant time online, suitable for large-scale and streaming data.

Findings

01

EXPoSE achieves competitive accuracy with state-of-the-art methods.

02

It operates in linear time offline and constant time online.

03

It requires only constant memory and can adapt to concept drift.

Abstract

We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

numenta/NAB
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.