When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

TL;DR
This paper investigates the expressivity and trainability of narrow neural networks with fewer than n neurons, showing they can be as expressive as wider networks and identifying favorable optimization regions.
Contribution
It provides theoretical guarantees for expressivity and optimization landscape of narrow networks, and proposes a constrained training method with promising empirical results.
Findings
Narrow networks with width m ≥ 2n/d can achieve zero training loss.
A local region exists with no local minima or saddle points.
Projected gradient methods outperform SGD in training narrow networks.
Abstract
Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than (sample size) neurons when the activation is smooth. First, we prove that as long as the width (where is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
