Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention
Yingzhen Yang

TL;DR
This paper demonstrates that over-parameterized two-layer neural networks with channel attention can learn low-degree spherical polynomials efficiently, achieving minimax optimal risk bounds with significantly reduced sample complexity.
Contribution
It introduces a novel neural network training method with channel attention that achieves minimax optimal learning rates for low-degree spherical polynomials, with provable channel selection.
Findings
Sample complexity is $n hicksim d^{ ext{degree}}/ ext{error}$, significantly better than previous bounds.
The trained network achieves the minimax optimal nonparametric regression risk rate.
A two-stage training process includes a provable channel selection algorithm.
Abstract
We study the problem of learning a low-degree spherical polynomial of degree defined on the unit sphere in by training an over-parameterized two-layer neural network (NN) with channel attention in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk , a carefully designed two-layer NN with channel attention and finite width trained by the vanilla gradient descent (GD) requires the lowest sample complexity of with high probability, in contrast with the representative sample complexity \Theta\pth{d^{\ell_0} \max\set{\eps^{-2},\log d}}, where is the training data size. Moreover, such sample complexity is not improvable since the trained network renders a sharp rate of the nonparametric regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
