Learning and generalization of one-hidden-layer neural networks, going   beyond standard Gaussian data

Hongkang Li; Shuai Zhang; Meng Wang

arXiv:2207.03615·cs.LG·January 30, 2023

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

Hongkang Li, Shuai Zhang, Meng Wang

PDF

Open Access

TL;DR

This paper studies the training and generalization of one-hidden-layer neural networks with Gaussian mixture inputs, providing theoretical guarantees on convergence, sample complexity, and the influence of input distributions.

Contribution

It offers the first theoretical analysis of how Gaussian mixture input distributions affect learning rates and sample complexity in neural networks.

Findings

01

Linear convergence to critical points with finite samples

02

Characterization of input distribution impact on sample complexity

03

Guaranteed generalization error bounds

Abstract

This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Face and Expression Recognition