Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum
Kirby Banman, Liam Peet-Pare, Nidhi Hegde, Alona Fyshe, Martha White

TL;DR
This paper demonstrates that stochastic gradient descent with momentum (SGDm) can become unstable and diverge under covariate shift due to resonance phenomena, especially outside iid sampling regimes, with both theoretical and empirical support.
Contribution
It reveals that SGDm can experience resonance and divergence under covariate shift, extending understanding beyond iid assumptions with theoretical and empirical insights.
Findings
SGDm can diverge under covariate shift due to resonance.
Resonance phenomena persist in nonlinear neural network training.
Theoretical analysis applies to linear periodic covariate shift, supported by empirical results.
Abstract
Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. In particular, we show SGDm under covariate shift is a parametric oscillator, and so can suffer from a phenomenon known as resonance. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes. The theoretical result is limited to the linear setting with periodic covariate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topicsstochastic dynamics and bifurcation · Neural dynamics and brain function · Neural Networks and Applications
