Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training
Ivan Pasichnyk

TL;DR
This paper introduces a parameter-free, time-varying momentum schedule inspired by critical damping, which accelerates training, diagnoses problematic layers, and enables targeted corrections in neural networks.
Contribution
It derives a novel momentum schedule from physical principles, providing a diagnostic tool for identifying and correcting failure modes in trained neural networks.
Findings
Beta-scheduling achieves 1.9x faster convergence on CIFAR-10 with ResNet-18.
The diagnostic identifies the same problematic layers across different optimizers.
Correcting identified layers improves accuracy with minimal retraining.
Abstract
Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
