Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations
Bo Liu

TL;DR
This paper demonstrates theoretically that spurious local minima are prevalent in deep neural networks with piecewise linear activations, due to the nature of their CPWL outputs fitting disjoint data groups.
Contribution
It provides a theoretical explanation for the common occurrence of spurious local minima in deep networks with piecewise linear activations, using a novel proof technique.
Findings
Spurious local minima are common in deep networks with piecewise linear activations.
CPWL outputs can fit disjoint data groups, causing local minima.
The proof applies to any continuous loss function.
Abstract
In this paper, it is shown theoretically that spurious local minima are common for deep fully-connected networks and convolutional neural networks (CNNs) with piecewise linear activation functions and datasets that cannot be fitted by linear models. A motivating example is given to explain the reason for the existence of spurious local minima: each output neuron of deep fully-connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) output, and different pieces of CPWL output can fit disjoint groups of data samples when minimizing the empirical risk. Fitting data samples with different CPWL functions usually results in different levels of empirical risk, leading to prevalence of spurious local minima. This result is proved in general settings with any continuous loss function. The main proof technique is to represent a CPWL function as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
Methods*Communicated@Fast*How Do I Communicate to Expedia?
