When to retrain a machine learning model
Regol Florence, Schwinn Leo, Sprague Kyle, Coates Mark, Markovich Thomas

TL;DR
This paper introduces a principled, uncertainty-based approach for determining optimal retraining times of machine learning models amid evolving data, outperforming existing methods across multiple datasets.
Contribution
It proposes a comprehensive formulation of the retraining decision problem and an uncertainty-driven method that forecasts model performance evolution for better timing decisions.
Findings
Outperforms existing baselines on 7 datasets
Effectively detects when to retrain models under distribution shift
Provides a practical solution for real-world model maintenance
Abstract
A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of data. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Machine Learning and Data Classification
