Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System
Vishwas Choudhary, Binay Gupta, Anirban Chatterjee, Subhadip Paul,, Kunal Banerjee, Vijay Agneeswaran

TL;DR
This paper systematically studies how concept drift detection can be effectively performed in datasets with missing values, proposing an ensemble approach that improves detection accuracy in real-world applications.
Contribution
It provides a comprehensive analysis of missing data patterns, imputation methods, and drift detection techniques, culminating in an ensemble method that enhances detection performance.
Findings
No single drift detector outperforms others across all metrics.
Ensemble of multiple detectors yields better overall performance.
Effective drift detection is achievable despite data sparsity.
Abstract
Missing values, widely called as \textit{sparsity} in literature, is a common characteristic of many real-world datasets. Many imputation methods have been proposed to address this problem of data incompleteness or sparsity. However, the accuracy of a data imputation method for a given feature or a set of features in a dataset is highly dependent on the distribution of the feature values and its correlation with other features. Another problem that plagues industry deployments of machine learning (ML) solutions is concept drift detection, which becomes more challenging in the presence of missing values. Although data imputation and concept drift detection have been studied extensively, little work has attempted a combined study of the two phenomena, i.e., concept drift detection in the presence of sparsity. In this work, we carry out a systematic study of the following: (i) different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Air Quality Monitoring and Forecasting · Machine Learning and Data Classification
