Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

TL;DR
This paper investigates the learnability and computational hardness of additive models with diverse ridge functions, demonstrating efficient gradient-based training for many such functions despite inherent complexity challenges.
Contribution
It introduces a framework for understanding the learnability of complex additive models with diverse ridge functions, showing polynomial-time training under certain conditions and establishing SQ hardness bounds.
Findings
Polynomial-time learnability for a large class of functions.
Gradient descent effectively trains neural networks for these models.
Computational hardness results via SQ lower bounds.
Abstract
We study the computational and sample complexity of learning a target function with additive structure, that is, , where are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features , and the number of additive tasks grows with the dimensionality for . This problem setting is motivated by the classical additive model literature, the recent representation learning theory of two-layer neural network, and large-scale pretraining where the model simultaneously acquires a large number of "skills" that are often localized in distinct parts of the trained network. We prove that a large subset of polynomial can be efficiently learned by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
