The Early Phase of Neural Network Training

Jonathan Frankle; David J. Schwab; Ari S. Morcos

arXiv:2002.10365·cs.LG·February 25, 2020·50 cites

The Early Phase of Neural Network Training

Jonathan Frankle, David J. Schwab, Ari S. Morcos

PDF

Open Access 1 Repo

TL;DR

This paper investigates the early training phase of deep neural networks, revealing rapid, label-independent changes in weight distributions and the effects of pre-training methods, which enhances understanding of initial learning dynamics.

Contribution

It provides a detailed quantitative analysis of neural network changes during early training, highlighting non-robustness to reinitialization and the impact of pre-training strategies.

Findings

01

Deep networks are not robust to reinitialization with sign-preserving weights.

02

Weight distributions become highly non-independent within hundreds of iterations.

03

Pre-training with blurred inputs or self-supervised tasks approximates early training changes.

Abstract

Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/open_lth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning