Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis
Yunwei Ren, Jason D. Lee

TL;DR
This paper analyzes the sample complexity of learning multi-index models using online SGD, showing that considering both second- and higher-order terms improves learning efficiency and accuracy.
Contribution
It introduces a refined analysis of information exponents in multi-index models, enabling more efficient learning by leveraging higher-order terms.
Findings
Learning relevant subspace with second-order terms
Recovering exact directions with higher-order terms
Reduced sample complexity to d P^{L-1}
Abstract
The information exponent ([BAGJ21]) and its extensions -- which are equivalent to the lowest degree in the Hermite expansion of the link function (after a potential label transform) for Gaussian single-index models -- have played an important role in predicting the sample complexity of online stochastic gradient descent (SGD) in various learning tasks. In this work, we demonstrate that, for multi-index models, focusing solely on the lowest degree can miss key structural details of the model and result in suboptimal rates. Specifically, we consider the task of learning target functions of form , where , the ground-truth directions are orthonormal, and the information exponent of is . Based on the theory of information exponent, when , only the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsStochastic Gradient Descent
