SpotTune: Leveraging Transient Resources for Cost-efficient Hyper-parameter Tuning in the Public Cloud
Yan Li, Bo An, Junming Ma, Donggang Cao, Yasha Wang, Hong Mei

TL;DR
SpotTune is a novel method that leverages transient cloud resources with cost-aware strategies to significantly reduce hyper-parameter tuning costs and time in machine learning workflows.
Contribution
This paper introduces SpotTune, a new approach that exploits revocable cloud resources with tailored strategies for efficient and cost-effective hyper-parameter tuning.
Findings
Cost reduced by up to 90%
Performance-cost rate improved by 16.61x
Effective use of transient cloud resources
Abstract
Hyper-parameter tuning (HPT) is crucial for many machine learning (ML) algorithms. But due to the large searching space, HPT is usually time-consuming and resource-intensive. Nowadays, many researchers use public cloud resources to train machine learning models, convenient yet expensive. How to speed up the HPT process while at the same time reduce cost is very important for cloud ML users. In this paper, we propose SpotTune, an approach that exploits transient revocable resources in the public cloud with some tailored strategies to do HPT in a parallel and cost-efficient manner. Orchestrating the HPT process upon transient servers, SpotTune uses two main techniques, fine-grained cost-aware resource provisioning, and ML training trend predicting, to reduce the monetary cost and runtime of HPT processes. Our evaluations show that SpotTune can reduce the cost by up to 90% and achieve a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications
