Error dynamics of mini-batch gradient descent with random reshuffling for least squares regression
Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

TL;DR
This paper analyzes the error dynamics of mini-batch gradient descent with random reshuffling in least squares regression, revealing how batching influences convergence and generalization through spectral shrinkage effects.
Contribution
It introduces a novel analysis linking mini-batch dynamics to a sample cross-covariance matrix, uncovering subtle step size dependencies and spectral effects not captured by gradient flow.
Findings
Mini-batch and full-batch gradient descent dynamics agree at leading order.
Mini-batch gradient descent exhibits step size-dependent convergence behavior.
Batching induces spectral shrinkage affecting the learning process.
Abstract
We study the discrete dynamics of mini-batch gradient descent with random reshuffling for least squares regression. We show that the training and generalization errors depend on a sample cross-covariance matrix between the original features and a set of new features in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. However, mini-batch gradient descent with random reshuffling exhibits a subtle dependence on the step size that a gradient flow analysis cannot detect, such as converging to a limit that depends on the step size. By comparing , a non-commutative polynomial of random matrices, with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMineral Processing and Grinding · Neural Networks and Applications
MethodsSparse Evolutionary Training
