On the Landscape of One-hidden-layer Sparse Networks and Beyond

Dachao Lin; Ruoyu Sun; Zhihua Zhang

arXiv:2009.07439·cs.LG·May 18, 2022

On the Landscape of One-hidden-layer Sparse Networks and Beyond

Dachao Lin, Ruoyu Sun, Zhihua Zhang

PDF

Open Access

TL;DR

This paper investigates the loss landscape of one-hidden-layer sparse neural networks, revealing conditions under which they have no spurious valleys or minima, and highlighting differences from dense networks.

Contribution

It provides the first theoretical analysis of the loss landscape of sparse neural networks, identifying when spurious valleys and minima can occur.

Findings

01

Linear sparse networks can have no spurious valleys under certain structures.

02

Non-linear sparse networks with a wide final layer can also lack spurious valleys.

03

Wide sparse networks with a sparse final layer can have spurious valleys and minima.

Abstract

Sparse neural networks have received increasing interest due to their small size compared to dense networks. Nevertheless, most existing works on neural network theory have focused on dense neural networks, and the understanding of sparse networks is very limited. In this paper, we study the loss landscape of one-hidden-layer sparse networks. First, we consider sparse networks with a dense final layer. We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer. Second, we discover that spurious valleys and spurious minima can exist for wide sparse networks with a sparse final layer. This is different from wide dense networks which do not have spurious valleys under mild assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques