Suspicious Alignment of SGD: A Fine-Grained Step Size Condition Analysis
Shenyang Deng, Boyao Liao, Zhuoli Ouyang, Tianyu Pang, Minhak Song, Yaoqing Yang

TL;DR
This paper analyzes the suspicious alignment phenomenon in SGD under ill-conditioned settings, revealing how step size influences the alignment behavior and its impact on loss reduction, supported by a fine-grained theoretical analysis.
Contribution
It introduces an adaptive step-size condition that explains the phase transition in gradient alignment and its effects on loss reduction in high-dimensional quadratic problems.
Findings
Adaptive step size separates alignment-decreasing and increasing regimes.
Projection to the bulk space can decrease loss, while projection to the dominant space can increase loss.
SGD exhibits a two-phase behavior with initial alignment decrease followed by stabilization.
Abstract
This paper explores the suspicious alignment phenomenon in stochastic gradient descent (SGD) under ill-conditioned optimization, where the Hessian spectrum splits into dominant and bulk subspaces. This phenomenon describes the behavior of gradient alignment in SGD updates. Specifically, during the initial phase of SGD updates, the alignment between the gradient and the dominant subspace tends to decrease. Subsequently, it enters a rising phase and eventually stabilizes in a high-alignment phase. The alignment is considered ``suspicious'' because, paradoxically, the projected gradient update along this highly-aligned dominant subspace proves ineffective at reducing the loss. The focus of this work is to give a fine-grained analysis in a high-dimensional quadratic setup about how step size selection produces this phenomenon. Our main contribution can be summarized as follows: We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
