Rethinking the Hyperparameters for Fine-tuning
Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran,, Rahul Bhotika, Stefano Soatto

TL;DR
This paper critically re-evaluates hyperparameter choices in fine-tuning pre-trained models, revealing their dataset dependency and the importance of parameters like momentum, which impacts transfer learning performance.
Contribution
It provides extensive empirical insights into hyperparameter effects during fine-tuning, especially highlighting the roles of momentum and dataset similarity, challenging standard practices.
Findings
Momentum influences fine-tuning performance.
Hyperparameters are dataset-dependent and sensitive to domain similarity.
Reference-based regularization may not be effective for dissimilar datasets.
Abstract
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyperparameters and keeping them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyperparameters for fine-tuning. Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. (1) While prior works have thoroughly investigated learning rate and batch size, momentum for fine-tuning is a relatively unexplored parameter. We find that the value of momentum also affects fine-tuning performance and connect it with previous theoretical findings. (2) Optimal hyperparameters for fine-tuning, in particular, the effective learning rate, are not only dataset dependent but also sensitive to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
