On Warm-Starting Neural Network Training
Jordan T. Ash, Ryan P. Adams

TL;DR
This paper investigates why warm-starting neural network training often leads to poorer generalization than fresh initialization, despite similar training losses, and proposes a simple trick to improve warm-start performance.
Contribution
The authors analyze the generalization issues of warm-starting neural networks and introduce a simple method to mitigate this problem, enhancing resource efficiency.
Findings
Warm-starting can cause poorer generalization despite similar training loss.
A simple trick can improve warm-start performance in neural network training.
The proposed method reduces resource usage in deep learning system development.
Abstract
In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate -- to "warm start" the optimization rather than initialize from scratch -- and see reductions in wall-clock time. However, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
