Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention

Yingzhen Yang

arXiv:2512.20562·stat.ML·April 28, 2026

Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention

Yingzhen Yang

PDF

TL;DR

This paper demonstrates that over-parameterized two-layer neural networks with channel attention can learn low-degree spherical polynomials efficiently, achieving minimax optimal risk bounds with significantly reduced sample complexity.

Contribution

It introduces a novel neural network training method with channel attention that achieves minimax optimal learning rates for low-degree spherical polynomials, with provable channel selection.

Findings

01

Sample complexity is $n hicksim d^{ ext{degree}}/ ext{error}$, significantly better than previous bounds.

02

The trained network achieves the minimax optimal nonparametric regression risk rate.

03

A two-stage training process includes a provable channel selection algorithm.

Abstract

We study the problem of learning a low-degree spherical polynomial of degree $ℓ_{0} = Θ (1) \geq 1$ defined on the unit sphere in $\RR^{d}$ by training an over-parameterized two-layer neural network (NN) with channel attention in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, 1)$ , a carefully designed two-layer NN with channel attention and finite width trained by the vanilla gradient descent (GD) requires the lowest sample complexity of $n ≍ Θ (d^{ℓ_{0}} / \eps)$ with high probability, in contrast with the representative sample complexity $\Theta\pth{d^{\ell_0} \max\set{\eps^{-2},\log d}}$ , where $n$ is the training data size. Moreover, such sample complexity is not improvable since the trained network renders a sharp rate of the nonparametric regression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.