Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis

Yunwei Ren; Jason D. Lee

arXiv:2410.09678·cs.LG·October 7, 2025

Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis

Yunwei Ren, Jason D. Lee

PDF

Open Access 1 Video

TL;DR

This paper analyzes the sample complexity of learning multi-index models using online SGD, showing that considering both second- and higher-order terms improves learning efficiency and accuracy.

Contribution

It introduces a refined analysis of information exponents in multi-index models, enabling more efficient learning by leveraging higher-order terms.

Findings

01

Learning relevant subspace with second-order terms

02

Recovering exact directions with higher-order terms

03

Reduced sample complexity to d P^{L-1}

Abstract

The information exponent ([BAGJ21]) and its extensions -- which are equivalent to the lowest degree in the Hermite expansion of the link function (after a potential label transform) for Gaussian single-index models -- have played an important role in predicting the sample complexity of online stochastic gradient descent (SGD) in various learning tasks. In this work, we demonstrate that, for multi-index models, focusing solely on the lowest degree can miss key structural details of the model and result in suboptimal rates. Specifically, we consider the task of learning target functions of form $f_{*} (x) = \sum_{k = 1}^{P} ϕ (v_{k}^{*} \cdot x)$ , where $P ≪ d$ , the ground-truth directions ${v_{k}^{*}}_{k = 1}^{P}$ are orthonormal, and the information exponent of $ϕ$ is $L$ . Based on the theory of information exponent, when $L = 2$ , only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsStochastic Gradient Descent