$\mu$pscaling small models: Principled warm starts and hyperparameter transfer
Yuxin Ma, Nan Chen, Mateo D\'iaz, Soufiane Hayou, Dmitriy Kunisky, Soledad Villar

TL;DR
This paper introduces a theoretically grounded method for upscaling small neural models to larger sizes, enabling efficient hyperparameter transfer and faster training convergence across diverse architectures.
Contribution
It presents a general upscaling approach applicable to various architectures, backed by theory, and extends hyperparameter transfer techniques for improved model scaling.
Findings
The proposed method guarantees model equivalence to widened versions.
Hyperparameter transfer using our method is effective on realistic datasets.
The approach accelerates training convergence for upscaled models.
Abstract
Modern large-scale neural networks are often trained and released in multiple sizes to accommodate diverse inference budgets. To improve efficiency, recent work has explored model upscaling: initializing larger models from trained smaller ones in order to transfer knowledge and accelerate convergence. However, this method can be sensitive to hyperparameters that need to be tuned at the target upscaled model size, which is prohibitively costly to do directly. It remains unclear whether the most common workaround -- tuning on smaller models and extrapolating via hyperparameter scaling laws -- is still sound when using upscaling. We address this with principled approaches to upscaling with respect to model widths and efficiently tuning hyperparameters in this setting. First, motivated by P and any-dimensional architectures, we introduce a general upscaling method applicable to a broad…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Machine Learning in Materials Science
