Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A., Erdogdu

TL;DR
This paper explores how structured data, specifically spiked covariance models, influences the effectiveness and sample complexity of gradient-based feature learning, revealing new phenomena and improvements over isotropic assumptions.
Contribution
It demonstrates the impact of data structure on gradient dynamics, introduces normalization techniques to improve recovery, and shows reduced sample complexity in structured settings.
Findings
Gradient dynamics may fail in anisotropic data without normalization.
Normalization techniques can improve direction recovery.
Structured data can reduce sample complexity and outperform kernel methods.
Abstract
Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input…
Peer Reviews
Decision·NeurIPS 2023 poster
${\bf Originality}$: This paper introduces a new investigation into learning single-index models with a spiked covariance structure, expanding upon prior research that primarily focused on isotropic data. This unique focus on the impact of additional structure in the covariance matrix sets it apart from existing works. Furthermore, the introduction of weight normalization techniques and the exploration of their effects in anisotropic scenarios demonstrate originality in addressing the limitation
${\bf (1)}\ \textbf{The training procedure considered in the paper is not practical:}$ The authors focus on a two-step training procedure that deviates from the standard gradient descent (GD) commonly used in practice. It is challenging to assess the extent to which the results obtained from this training procedure translate to impacts on standard training algorithms. ${\bf (2)} \ \textbf{Differences from the existing works:}$ Some existing works, such as Refs [BBSS22] [BAGJ21], have establishe
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Hydrological Forecasting Using AI · Gaussian Processes and Bayesian Inference
MethodsBatch Normalization · Weight Normalization
