Theoretical properties of the global optimizer of two layer neural   network

Digvijay Boob; Guanghui Lan

arXiv:1710.11241·cs.LG·November 1, 2017·27 cites

Theoretical properties of the global optimizer of two layer neural network

Digvijay Boob, Guanghui Lan

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the global optimality conditions for training two-layer neural networks with differentiable activation functions, showing that first-order solutions are globally optimal under certain conditions and analyzing the smoothness of the objective.

Contribution

It establishes conditions under which first-order optimal solutions are globally optimal for two-layer neural networks with non-singular hidden layers and analyzes the smoothness and convergence properties of the optimization landscape.

Findings

01

First-order optimal solutions are globally optimal when the hidden layer is non-singular.

02

The objective function is Lipschitz smooth, facilitating convergence analysis.

03

The proposed approach maintains non-singularity of the hidden layer during optimization.

Abstract

In this paper, we study the problem of optimizing a two-layer artificial neural network that best fits a training dataset. We look at this problem in the setting where the number of parameters is greater than the number of sampled points. We show that for a wide class of differentiable activation functions (this class involves "almost" all functions which are not piecewise linear), we have that first-order optimal solutions satisfy global optimality provided the hidden layer is non-singular. Our results are easily extended to hidden layers given by a flat matrix from that of a square matrix. Results are applicable even if network has more than one hidden layer provided all hidden layers satisfy non-singularity, all activations are from the given "good" class of differentiable functions and optimization is only with respect to the last hidden layer. We also study the smoothness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition · Metaheuristic Optimization Algorithms Research