When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Jiawei Zhang; Yushun Zhang; Mingyi Hong; Ruoyu Sun; Zhi-Quan Luo

arXiv:2210.12001·cs.LG·October 24, 2022

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

PDF

Open Access 1 Video

TL;DR

This paper investigates the expressivity and trainability of narrow neural networks with fewer than n neurons, showing they can be as expressive as wider networks and identifying favorable optimization regions.

Contribution

It provides theoretical guarantees for expressivity and optimization landscape of narrow networks, and proposes a constrained training method with promising empirical results.

Findings

01

Narrow networks with width m ≥ 2n/d can achieve zero training loss.

02

A local region exists with no local minima or saddle points.

03

Projected gradient methods outperform SGD in training narrow networks.

Abstract

Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth. First, we prove that as long as the width $m \geq 2 n / d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications

MethodsStochastic Gradient Descent