Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training
Cristian P\'erez-Corral, Alberto Fern\'andez-Hern\'andez, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ort\'i

TL;DR
This paper investigates the training dynamics of ReLU-based neural networks, revealing a two-stage process where early significant activation pattern changes give way to later stable refinement within fixed activation regimes, supported by theoretical and empirical analysis.
Contribution
It introduces a local stability property of activation patterns and empirically demonstrates a decoupled two-timescale behavior in neural network training across various architectures.
Findings
Activation pattern changes decay earlier than weight updates.
Training often proceeds within stable activation regimes in late stages.
Provides a framework for monitoring and understanding training dynamics.
Abstract
Despite the empirical success of DNN, their internal training dynamics remain difficult to characterize. In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely. Motivated by this geometry, we investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model within largely stable activation regimes. We first prove a local stability property: outside measure-zero sets of parameters and inputs, sufficiently small parameter perturbations preserve the activation pattern of a fixed input, implying locally affine behavior within activation regions. We then empirically track per-iteration changes in weights and activation patterns across fully-connected and convolutional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
