Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

Ivan Pasichnyk

arXiv:2603.28921·cs.LG·April 7, 2026

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

Ivan Pasichnyk

PDF

TL;DR

This paper introduces a parameter-free, time-varying momentum schedule inspired by critical damping, which accelerates training, diagnoses problematic layers, and enables targeted corrections in neural networks.

Contribution

It derives a novel momentum schedule from physical principles, providing a diagnostic tool for identifying and correcting failure modes in trained neural networks.

Findings

01

Beta-scheduling achieves 1.9x faster convergence on CIFAR-10 with ResNet-18.

02

The diagnostic identifies the same problematic layers across different optimizers.

03

Correcting identified layers improves accuracy with minimal retraining.

Abstract

Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.