Piecewise linear activations substantially shape the loss surfaces of neural networks
Fengxiang He, Bohan Wang, Dacheng Tao

TL;DR
This paper investigates how piecewise linear activation functions influence the loss surfaces of neural networks, revealing the existence of infinite spurious local minima and the complex structure of the loss landscape.
Contribution
It proves that neural networks with piecewise linear activations have complex loss surfaces with infinite spurious minima and characterizes their geometric structure.
Findings
Neural networks with piecewise linear activations have infinite spurious local minima.
Loss surfaces are partitioned into smooth cells separated by nondifferentiable boundaries.
Local minima within a cell form an equivalence class and are often global minima.
Abstract
Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that {\it the loss surfaces of many neural networks have infinite spurious local minima} which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise linear activations possess substantial differences to the well-studied linear neural networks. This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice. Essentially, the underlying assumptions are consistent with most practical circumstances where the output layer is narrower than any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
