In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, and Samir Garibov, Edward Bergman, Frank Hutter

TL;DR
This paper introduces FT-PFN, a transformer-based surrogate model for freeze-thaw Bayesian optimization, achieving faster and more accurate hyperparameter tuning in deep learning with state-of-the-art results.
Contribution
It presents FT-PFN, a novel prior-data fitted network that efficiently performs Bayesian learning curve extrapolation, improving freeze-thaw Bayesian optimization.
Findings
FT-PFN predictions are 10-100 times faster than previous surrogates.
FT-PFN provides more accurate predictions across benchmarks.
The in-context freeze-thaw BO method achieves state-of-the-art hyperparameter optimization results.
Abstract
With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Heat Transfer and Optimization · Machine Learning and Data Classification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax
