Time to Retrain? Detecting Concept Drifts in Machine Learning Systems
Tri Minh Triet Pham, Karthikeyan Premkumar, Mohamed Naili, Jinqiu Yang

TL;DR
This paper introduces CDSeer, a novel model-agnostic technique for detecting concept drift in machine learning systems, significantly reducing manual labeling effort while maintaining high detection accuracy across diverse datasets.
Contribution
The paper presents CDSeer, a new semi-supervised, model-agnostic concept drift detection method that outperforms existing techniques in precision, recall, and labeling efficiency.
Findings
CDSeer achieves 57.1% higher precision with 99% fewer labels than SOTA methods.
It performs comparably to supervised methods requiring full data labeling.
Demonstrated effectiveness across eight diverse datasets and in industrial deployment.
Abstract
With the boom of machine learning (ML) techniques, software practitioners build ML systems to process the massive volume of streaming data for diverse software engineering tasks such as failure prediction in AIOps. Trained using historical data, such ML models encounter performance degradation caused by concept drift, i.e., data and inter-relationship (concept) changes between training and production. It is essential to use concept rift detection to monitor the deployed ML models and re-train the ML models when needed. In this work, we explore applying state-of-the-art (SOTA) concept drift detection techniques on synthetic and real-world datasets in an industrial setting. Such an industrial setting requires minimal manual effort in labeling and maximal generality in ML model architecture. We find that current SOTA semi-supervised methods not only require significant labeling effort but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Network Security and Intrusion Detection
