Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy
Haoyu Fu, Yuejie Chi, Yingbin Liang

TL;DR
This paper proves that gradient descent can reliably recover the weights of a one-hidden-layer neural network from Gaussian inputs using cross entropy, with guarantees on convergence, sample complexity, and initialization.
Contribution
It establishes the first global convergence guarantees for empirical risk minimization with cross entropy in one-hidden-layer neural networks, including initialization and sample complexity analysis.
Findings
Gradient descent converges linearly to the ground truth when initialized properly.
Empirical risk exhibits strong convexity and smoothness near the ground truth.
The tensor method provides a valid initialization for the training process.
Abstract
We study model recovery for data classification, where the training labels are generated from a one-hidden-layer neural network with sigmoid activations, also known as a single-layer feedforward network, and the goal is to recover the weights of the neural network. We consider two network models, the fully-connected network (FCN) and the non-overlapping convolutional neural network (CNN). We prove that with Gaussian inputs, the empirical risk based on cross entropy exhibits strong convexity and smoothness {\em uniformly} in a local neighborhood of the ground truth, as soon as the sample complexity is sufficiently large. This implies that if initialized in this neighborhood, gradient descent converges linearly to a critical point that is provably close to the ground truth. Furthermore, we show such an initialization can be obtained via the tensor method. This establishes the global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
