Learning sum of diverse features: computational hardness and efficient   gradient-based training for ridge combinations

Kazusato Oko; Yujin Song; Taiji Suzuki; Denny Wu

arXiv:2406.11828·cs.LG·June 18, 2024

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

PDF

Open Access

TL;DR

This paper investigates the learnability and computational hardness of additive models with diverse ridge functions, demonstrating efficient gradient-based training for many such functions despite inherent complexity challenges.

Contribution

It introduces a framework for understanding the learnability of complex additive models with diverse ridge functions, showing polynomial-time training under certain conditions and establishing SQ hardness bounds.

Findings

01

Polynomial-time learnability for a large class of functions.

02

Gradient descent effectively trains neural networks for these models.

03

Computational hardness results via SQ lower bounds.

Abstract

We study the computational and sample complexity of learning a target function $f_{*} : R^{d} \to R$ with additive structure, that is, $f_{*} (x) = \frac{1}{M} \sum_{m = 1}^{M} f_{m} (⟨ x, v_{m} ⟩)$ , where $f_{1}, f_{2}, ..., f_{M} : R \to R$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features ${v_{m}}_{m = 1}^{M}$ , and the number of additive tasks $M$ grows with the dimensionality $M ≍ d^{γ}$ for $γ \geq 0$ . This problem setting is motivated by the classical additive model literature, the recent representation learning theory of two-layer neural network, and large-scale pretraining where the model simultaneously acquires a large number of "skills" that are often localized in distinct parts of the trained network. We prove that a large subset of polynomial $f_{*}$ can be efficiently learned by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications