Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
Filip Kova\v{c}evi\'c, Hong Chang Ji, Denny Wu, Mahdi Soltanolkotabi, Marco Mondelli

TL;DR
This paper demonstrates that full-batch gradient descent can outperform one-pass stochastic gradient descent in single-index models by requiring fewer samples for effective learning, especially when using truncated activations.
Contribution
It reveals that full-batch GD can achieve better sample complexity than online SGD in certain single-index models with quadratic activation, under specific conditions.
Findings
Full-batch GD outperforms one-pass SGD in sample efficiency for certain models.
Truncating the activation improves the optimization landscape for GD.
A trajectory analysis shows GD achieves strong recovery with fewer samples and steps.
Abstract
It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of full-batch gradient descent (GD, which always reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) remains unclear. In this work, we consider learning a -dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires samples to achieve weak recovery. We first show that this factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at samples, thereby outperforming one-pass SGD (with the same activation) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Sparse and Compressive Sensing Techniques
