Learning Halfspaces and Neural Networks with Random Initialization

Yuchen Zhang; Jason D. Lee; Martin J. Wainwright; Michael I. Jordan

arXiv:1511.07948·cs.LG·November 26, 2015·22 cites

Learning Halfspaces and Neural Networks with Random Initialization

Yuchen Zhang, Jason D. Lee, Martin J. Wainwright, Michael I. Jordan

PDF

Open Access

TL;DR

This paper introduces algorithms for learning halfspaces and neural networks via random initialization, achieving small excess risk with polynomial time complexity in data dimension and sample size, under certain data separability conditions.

Contribution

It presents new algorithms for non-convex learning that combine random initialization with optimization, providing guarantees for small excess risk and learnability under data separability.

Findings

01

Algorithms achieve arbitrarily small excess risk with polynomial complexity.

02

Learning is feasible under data separability with a constant margin.

03

Robustness to label noise with random flips is established.

Abstract

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are $L$ -Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $ϵ > 0$ . The time complexity is polynomial in the input dimension $d$ and the sample size $n$ , but exponential in the quantity $(L / ϵ^{2}) lo g (L / ϵ)$ . These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin $γ > 0$ , then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin $Ω (γ)$ . As a consequence, the algorithm achieves arbitrary generalization error $ϵ > 0$ with $poly (d, 1/ ϵ)$ sample and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Machine Learning and ELM