Cost-Effective Retraining of Machine Learning Models

Ananth Mahadevan; Michael Mathioudakis

arXiv:2310.04216·cs.LG·October 9, 2023·1 cites

Cost-Effective Retraining of Machine Learning Models

Ananth Mahadevan, Michael Mathioudakis

PDF

Open Access

TL;DR

This paper introduces Cara, a cost-aware algorithm for deciding when to retrain machine learning models, balancing accuracy and retraining costs to maintain performance efficiently over changing data streams.

Contribution

The paper presents Cara, a novel algorithm that optimizes retraining decisions considering data drift and costs, outperforming existing baselines in accuracy and cost efficiency.

Findings

01

Cara adapts to different data drifts and costs

02

Achieves better accuracy than drift detection baselines

03

Reduces total retraining costs while maintaining performance

Abstract

It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. However, this can be costly as it usually requires processing the entire dataset again. This creates a trade-off between retraining too frequently, which leads to unnecessary computing costs, and not retraining often enough, which results in stale and inaccurate ML models. To address this challenge, we propose ML systems that make automated and cost-effective decisions about when to retrain an ML model. We aim to optimize the trade-off by considering the costs associated with each decision. Our research focuses on determining whether to retrain or keep an existing ML model based on various factors, including the data, the model, and the predictive queries answered by the model. Our main contribution is a Cost-Aware Retraining Algorithm called Cara, which optimizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification