# Learning One-hidden-layer neural networks via Provable Gradient Descent   with Random Initialization

**Authors:** Shuhao Xia, Yuanming Shi

arXiv: 1907.06594 · 2019-07-17

## TL;DR

This paper demonstrates that gradient descent with random initialization can efficiently learn one-hidden-layer neural networks with quadratic activations in an under-parameterized setting, with provable convergence guarantees.

## Contribution

It provides the first provable analysis showing gradient descent converges to a global optimum for such neural networks under realistic conditions.

## Key findings

- Gradient descent enters a strongly convex region quickly.
- Linear convergence to the global optimum is achieved.
- Experimental results support theoretical claims.

## Abstract

Although deep learning has shown its powerful performance in many applications, the mathematical principles behind neural networks are still mysterious. In this paper, we consider the problem of learning a one-hidden-layer neural network with quadratic activations. We focus on the under-parameterized regime where the number of hidden units is smaller than the dimension of the inputs. We shall propose to solve the problem via a provable gradient-based method with random initialization. For the non-convex neural networks training problem we reveal that the gradient descent iterates are able to enter a local region that enjoys strong convexity and smoothness within a few iterations, and then provably converges to a globally optimal model at a linear rate with near-optimal sample complexity. We further corroborate our theoretical findings via various experiments.

---
Source: https://tomesphere.com/paper/1907.06594