Understanding Two-Layer Neural Networks with Smooth Activation Functions
Changcun Huang

TL;DR
This paper analyzes the training solutions of two-layer neural networks with smooth activation functions, revealing their approximation capabilities and underlying mechanisms through theoretical proofs and experiments.
Contribution
It introduces a novel framework for understanding two-layer networks with smooth activations, including new proofs and insights into their solution space.
Findings
Universal approximation property proved for arbitrary input dimensions.
Experimental verification supports theoretical insights.
Mechanisms involving Taylor expansions and smoothness principles are elucidated.
Abstract
This paper aims to understand the training solution, which is obtained by the back-propagation algorithm, of two-layer neural networks whose hidden layer is composed of the units with smooth activation functions, including the usual sigmoid type most commonly used before the advent of ReLUs. The mechanism contains four main principles: construction of Taylor series expansions, strict partial order of knots, smooth-spline implementation and smooth-continuity restriction. The universal approximation for arbitrary input dimensionality is proved and experimental verification is given, through which the mystery of ``black box'' of the solution space is largely revealed. The new proofs employed also enrich approximation theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
