# Phase diagram of early training dynamics in deep neural networks: effect   of the learning rate, depth, and width

**Authors:** Dayal Singh Kalra, Maissam Barkeshli

arXiv: 2302.12250 · 2023-10-25

## TL;DR

This paper explores how learning rate, depth, and width influence the early training dynamics of deep neural networks, revealing distinct regimes and a sharpness reduction phase that depends on these parameters.

## Contribution

It provides a systematic analysis of optimization regimes in DNNs, identifying critical parameters and phases, including a novel sharpness reduction phenomenon during early training.

## Key findings

- Identification of four distinct training regimes based on Hessian eigenvalues.
- Discovery of a sharpness reduction phase influenced by network depth and width.
- Critical thresholds in learning rate and architecture parameters that alter training dynamics.

## Abstract

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12250/full.md

## Figures

196 figures with captions in the complete paper: https://tomesphere.com/paper/2302.12250/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/2302.12250/full.md

---
Source: https://tomesphere.com/paper/2302.12250