Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Geonhui Yoo, Minhak Song, Chulhee Yun

TL;DR
This paper investigates the phenomenon of progressive sharpening in neural network training using a simple linear model, providing theoretical insights and empirical evidence on how dataset difficulty, depth, and optimizer stochasticity influence sharpness dynamics.
Contribution
It introduces a minimalist linear network model to analyze sharpness dynamics and extends the theoretical findings to practical neural network training scenarios.
Findings
Sharpness increases with network depth and dataset difficulty.
Stochasticity in optimizers affects the degree of progressive sharpening.
Theoretical insights align with empirical observations in real neural networks.
Abstract
When training deep neural networks with gradient descent, sharpness often increases -- a phenomenon known as progressive sharpening -- before saturating at the edge of stability. Although commonly observed in practice, the underlying mechanisms behind progressive sharpening remain poorly understood. In this work, we study this phenomenon using a minimalist model: a deep linear network with a single neuron per layer. We show that this simple model effectively captures the sharpness dynamics observed in recent empirical studies, offering a simple testbed to better understand neural network training. Moreover, we theoretically analyze how dataset properties, network depth, stochasticity of optimizers, and step size affect the degree of progressive sharpening in the minimalist model. We then empirically demonstrate how these theoretical insights extend to practical scenarios. This study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Machine Learning in Materials Science
