# On the Reliable Detection of Concept Drift from Streaming Unlabeled Data

**Authors:** Tegjyot Singh Sethi, Mehmed Kantardzic

arXiv: 1704.00023 · 2017-04-04

## TL;DR

This paper introduces MD3, an unsupervised, model-independent algorithm that reliably detects concept drift in streaming data by monitoring classifier uncertainty, reducing false alarms and improving detection credibility.

## Contribution

The paper proposes MD3, a novel unsupervised drift detection method that leverages classifier uncertainty, addressing limitations of existing unsupervised techniques and reducing false alarms.

## Key findings

- MD3 outperforms existing unsupervised drift detectors in reducing false alarms.
- Experimental results show MD3 reliably detects drifts across diverse datasets.
- MD3 is label-efficient and broadly applicable in real-world streaming scenarios.

## Abstract

Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.00023/full.md

## Figures

65 figures with captions in the complete paper: https://tomesphere.com/paper/1704.00023/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/1704.00023/full.md

---
Source: https://tomesphere.com/paper/1704.00023