Reducing Hyperparameter Tuning Costs in ML, Vision and Language Model   Training Pipelines via Memoization-Awareness

Abdelmajid Essofi; Ridwan Salahuddeen; Munachiso Nwadike; Elnura; Zhalieva; Kun Zhang; Eric Xing; Willie Neiswanger; Qirong Ho

arXiv:2411.03731·cs.LG·November 7, 2024

Reducing Hyperparameter Tuning Costs in ML, Vision and Language Model Training Pipelines via Memoization-Awareness

Abdelmajid Essofi, Ridwan Salahuddeen, Munachiso Nwadike, Elnura, Zhalieva, Kun Zhang, Eric Xing, Willie Neiswanger, Qirong Ho

PDF

Open Access 1 Repo

TL;DR

This paper introduces EEIPU, a memoization-aware Bayesian Optimization method that leverages pipeline caching to significantly reduce hyperparameter tuning costs and improve quality across ML, vision, and language model training.

Contribution

The paper presents a novel memoization-aware BO algorithm, EEIPU, which efficiently utilizes pipeline caching to enhance hyperparameter search in costly model training pipelines.

Findings

01

EEIPU evaluates 103% more hyperparameters within the same budget.

02

EEIPU achieves 108% higher validation metrics on average.

03

EEIPU outperforms recent BO algorithms in diverse pipelines.

Abstract

The training or fine-tuning of machine learning, vision, and language models is often implemented as a pipeline: a sequence of stages encompassing data preparation, model training and evaluation. In this paper, we exploit pipeline structures to reduce the cost of hyperparameter tuning for model training/fine-tuning, which is particularly valuable for language models given their high costs in GPU-days. We propose a "memoization-aware" Bayesian Optimization (BO) algorithm, EEIPU, that works in tandem with a pipeline caching system, allowing it to evaluate significantly more hyperparameter candidates per GPU-day than other tuning algorithms. The result is better-quality hyperparameters in the same amount of search time, or equivalently, reduced search time to reach the same hyperparameter quality. In our benchmarks on machine learning (model ensembles), vision (convolutional architecture)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymousWSDMSubmission/cost-bo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques