Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels
Yiming Xu, Diego Klabjan

TL;DR
This paper introduces an ensemble method for detecting concept drift and covariate shift in data streams, utilizing multiple signals and lagged labels to improve detection accuracy and retraining decisions.
Contribution
It proposes a novel ensemble approach that combines multiple data signals and accounts for lagged labels, addressing key limitations of existing drift detection methods.
Findings
Outperforms state-of-the-art methods significantly
Effective on both structured and unstructured data
Handles delayed label availability for better detection
Abstract
In model serving, having one fixed model during the entire often life-long inference process is usually detrimental to model performance, as data distribution evolves over time, resulting in lack of reliability of the model trained on historical data. It is important to detect changes and retrain the model in time. The existing methods generally have three weaknesses: 1) using only classification error rate as signal, 2) assuming ground truth labels are immediately available after features from samples are received and 3) unable to decide what data to use to retrain the model when change occurs. We address the first problem by utilizing six different signals to capture a wide range of characteristics of data, and we address the second problem by allowing lag of labels, where labels of corresponding features are received after a lag in time. For the third problem, our proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications
