In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter   Optimization

Herilalaina Rakotoarison; Steven Adriaensen; Neeratyoy Mallik; and Samir Garibov; Edward Bergman; Frank Hutter

arXiv:2404.16795·cs.LG·August 14, 2024·1 cites

In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, and Samir Garibov, Edward Bergman, Frank Hutter

PDF

Open Access 2 Repos

TL;DR

This paper introduces FT-PFN, a transformer-based surrogate model for freeze-thaw Bayesian optimization, achieving faster and more accurate hyperparameter tuning in deep learning with state-of-the-art results.

Contribution

It presents FT-PFN, a novel prior-data fitted network that efficiently performs Bayesian learning curve extrapolation, improving freeze-thaw Bayesian optimization.

Findings

01

FT-PFN predictions are 10-100 times faster than previous surrogates.

02

FT-PFN provides more accurate predictions across benchmarks.

03

The in-context freeze-thaw BO method achieves state-of-the-art hyperparameter optimization results.

Abstract

With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Heat Transfer and Optimization · Machine Learning and Data Classification

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax