Cost-Effective Retraining of Machine Learning Models
Ananth Mahadevan, Michael Mathioudakis

TL;DR
This paper introduces Cara, a cost-aware algorithm for deciding when to retrain machine learning models, balancing accuracy and retraining costs to maintain performance efficiently over changing data streams.
Contribution
The paper presents Cara, a novel algorithm that optimizes retraining decisions considering data drift and costs, outperforming existing baselines in accuracy and cost efficiency.
Findings
Cara adapts to different data drifts and costs
Achieves better accuracy than drift detection baselines
Reduces total retraining costs while maintaining performance
Abstract
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. However, this can be costly as it usually requires processing the entire dataset again. This creates a trade-off between retraining too frequently, which leads to unnecessary computing costs, and not retraining often enough, which results in stale and inaccurate ML models. To address this challenge, we propose ML systems that make automated and cost-effective decisions about when to retrain an ML model. We aim to optimize the trade-off by considering the costs associated with each decision. Our research focuses on determining whether to retrain or keep an existing ML model based on various factors, including the data, the model, and the predictive queries answered by the model. Our main contribution is a Cost-Aware Retraining Algorithm called Cara, which optimizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
