Hyperparameter Transfer Learning through Surrogate Alignment for Efficient Deep Neural Network Training
Ilija Ilievski, Jiashi Feng

TL;DR
This paper introduces a surrogate-based transfer learning approach for hyperparameter optimization in deep neural networks, enabling efficient transfer of hyperparameters from a source to a target dataset without hand-designed features.
Contribution
The proposed method learns to transfer hyperparameters between datasets using surrogate models and neural networks, bypassing the need for hand-crafted features and reducing training time.
Findings
Demonstrates effectiveness on three CV benchmark datasets.
Achieves comparable performance with fewer hyperparameter evaluations.
Outperforms traditional transfer methods in efficiency.
Abstract
Recently, several optimization methods have been successfully applied to the hyperparameter optimization of deep neural networks (DNNs). The methods work by modeling the joint distribution of hyperparameter values and corresponding error. Those methods become less practical when applied to modern DNNs whose training may take a few days and thus one cannot collect sufficient observations to accurately model the distribution. To address this challenging issue, we propose a method that learns to transfer optimal hyperparameter values for a small source dataset to hyperparameter values with comparable performance on a dataset of interest. As opposed to existing transfer learning methods, our proposed method does not use hand-designed features. Instead, it uses surrogates to model the hyperparameter-error distributions of the two datasets and trains a neural network to learn the transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
