Gradient-Based Feature Learning under Structured Data

Alireza Mousavi-Hosseini; Denny Wu; Taiji Suzuki; Murat A.; Erdogdu

arXiv:2309.03843·stat.ML·September 8, 2023

Gradient-Based Feature Learning under Structured Data

Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A., Erdogdu

PDF

Open Access 1 Video 1 Reviews

TL;DR

This paper explores how structured data, specifically spiked covariance models, influences the effectiveness and sample complexity of gradient-based feature learning, revealing new phenomena and improvements over isotropic assumptions.

Contribution

It demonstrates the impact of data structure on gradient dynamics, introduces normalization techniques to improve recovery, and shows reduced sample complexity in structured settings.

Findings

01

Gradient dynamics may fail in anisotropic data without normalization.

02

Normalization techniques can improve direction recovery.

03

Structured data can reduce sample complexity and outperform kernel methods.

Abstract

Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input…

Peer Reviews

Decision·NeurIPS 2023 poster

Reviewer 01Rating 5· Borderline accept: Technically solid paper where reasons to accept outweigh reasons to reject, e.g., limited evaluation. Please use sparingly.Confidence 3

Strengths

${\bf Originality}$: This paper introduces a new investigation into learning single-index models with a spiked covariance structure, expanding upon prior research that primarily focused on isotropic data. This unique focus on the impact of additional structure in the covariance matrix sets it apart from existing works. Furthermore, the introduction of weight normalization techniques and the exploration of their effects in anisotropic scenarios demonstrate originality in addressing the limitation

Weaknesses

${\bf (1)}\ \textbf{The training procedure considered in the paper is not practical:}$ The authors focus on a two-step training procedure that deviates from the standard gradient descent (GD) commonly used in practice. It is challenging to assess the extent to which the results obtained from this training procedure translate to impacts on standard training algorithms. ${\bf (2)} \ \textbf{Differences from the existing works:}$ Some existing works, such as Refs [BBSS22] [BAGJ21], have establishe

Videos

Gradient-Based Feature Learning under Structured Data· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Hydrological Forecasting Using AI · Gaussian Processes and Bayesian Inference

MethodsBatch Normalization · Weight Normalization