Unique Properties of Flat Minima in Deep Networks

Rotem Mulayoff; Tomer Michaeli

arXiv:2002.04710·cs.LG·August 11, 2020·1 cites

Unique Properties of Flat Minima in Deep Networks

Rotem Mulayoff, Tomer Michaeli

PDF

Open Access 1 Video

TL;DR

This paper characterizes the properties of flat minima in deep networks, showing they lead to nearly balanced, coupled layer structures that preserve signal gain, with implications for understanding training dynamics.

Contribution

It provides a theoretical analysis of flat minima in linear neural networks, revealing their structure and coupling properties, and extends findings to nonlinear models.

Findings

01

Linear ResNets with zero initialization converge to the flattest minima.

02

Flat minima correspond to nearly balanced networks with stable layer gains.

03

Layers in flat minima are coupled through singular vectors, forming a dedicated signal path.

Abstract

It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In deep neural network training, this mechanism serves to screen out minima. However, the precise effect that this has on the trained network is not yet fully understood. In this paper, we characterize the flat minima in linear neural networks trained with a quadratic loss. First, we show that linear ResNets with zero initialization necessarily converge to the flattest of all minima. We then prove that these minima correspond to nearly balanced networks whereby the gain from the input to any intermediate representation does not change drastically from one layer to the next. Finally, we show that consecutive layers in flat minima solutions are coupled. That is, one of the left singular vectors of each weight matrix, equals one of the right singular vectors of the next matrix. This forms a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unique Properties of Flat Minima in Deep Networks· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM