Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training

Cristian P\'erez-Corral; Alberto Fern\'andez-Hern\'andez; Jose I. Mestre; Manuel F. Dolz; Jose Duato; Enrique S. Quintana-Ort\'i

arXiv:2602.08333·cs.LG·February 10, 2026

Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training

Cristian P\'erez-Corral, Alberto Fern\'andez-Hern\'andez, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ort\'i

PDF

Open Access

TL;DR

This paper investigates the training dynamics of ReLU-based neural networks, revealing a two-stage process where early significant activation pattern changes give way to later stable refinement within fixed activation regimes, supported by theoretical and empirical analysis.

Contribution

It introduces a local stability property of activation patterns and empirically demonstrates a decoupled two-timescale behavior in neural network training across various architectures.

Findings

01

Activation pattern changes decay earlier than weight updates.

02

Training often proceeds within stable activation regimes in late stages.

03

Provides a framework for monitoring and understanding training dynamics.

Abstract

Despite the empirical success of DNN, their internal training dynamics remain difficult to characterize. In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely. Motivated by this geometry, we investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model within largely stable activation regimes. We first prove a local stability property: outside measure-zero sets of parameters and inputs, sufficiently small parameter perturbations preserve the activation pattern of a fixed input, implying locally affine behavior within activation regions. We then empirically track per-iteration changes in weights and activation patterns across fully-connected and convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques