Limitations of SGD for Multi-Index Models Beyond Statistical Queries
Daniel Barzilai, Ohad Shamir

TL;DR
This paper investigates the limitations of standard SGD in learning multi-index models, highlighting the inadequacy of the SQ framework and proposing a new approach applicable to various models including neural networks.
Contribution
It introduces a novel non-SQ framework to analyze vanilla SGD's limitations in multi-index models, extending understanding beyond prior SQ-based analyses.
Findings
SQ framework can be misleading for SGD analysis
Standard SGD faces fundamental limitations in multi-index models
Results apply to deep neural network architectures
Abstract
Understanding the limitations of gradient methods, and stochastic gradient descent (SGD) in particular, is a central challenge in learning theory. To that end, a commonly used tool is the Statistical Queries (SQ) framework, which studies performance limits of algorithms based on noisy interaction with the data. However, it is known that the formal connection between the SQ framework and SGD is tenuous: Existing results typically rely on adversarial or specially-structured gradient noise that does not reflect the noise in standard SGD, and (as we point out here) can sometimes lead to incorrect predictions. Moreover, many analyses of SGD for challenging problems rely on non-trivial algorithmic modifications, such as restricting the SGD trajectory to the sphere or using very small learning rates. To address these shortcomings, we develop a new, non-SQ framework to study the limitations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Generative Adversarial Networks and Image Synthesis
