On Warm-Starting Neural Network Training

Jordan T. Ash; Ryan P. Adams

arXiv:1910.08475·cs.LG·January 1, 2021·56 cites

On Warm-Starting Neural Network Training

Jordan T. Ash, Ryan P. Adams

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper investigates why warm-starting neural network training often leads to poorer generalization than fresh initialization, despite similar training losses, and proposes a simple trick to improve warm-start performance.

Contribution

The authors analyze the generalization issues of warm-starting neural networks and introduce a simple method to mitigate this problem, enhancing resource efficiency.

Findings

01

Warm-starting can cause poorer generalization despite similar training loss.

02

A simple trick can improve warm-start performance in neural network training.

03

The proposed method reduces resource usage in deep learning system development.

Abstract

In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate -- to "warm start" the optimization rather than initialize from scratch -- and see reductions in wall-clock time. However, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CS-433/cs-433-project-2-fesenjoon
pytorch

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

On Warm-Starting Neural Network Training· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning