Convergence Analysis of the Dynamics of a Special Kind of Two-Layered   Neural Networks with $\ell_1$ and $\ell_2$ Regularization

Zhifeng Kong

arXiv:1711.07005·stat.ML·November 21, 2017

Convergence Analysis of the Dynamics of a Special Kind of Two-Layered Neural Networks with $\ell_1$ and $\ell_2$ Regularization

Zhifeng Kong

PDF

Open Access 1 Repo

TL;DR

This paper extends convergence analysis for two-layer ReLU neural networks by incorporating $$ and $$ regularization, proving convergence to optimal solutions under certain conditions, supported by numerical experiments.

Contribution

It provides a theoretical convergence guarantee for regularized two-layer neural networks with ReLU activation, considering both and regularization terms.

Findings

01

Weight vectors converge to the optimal solution with high probability.

02

Small regularization coefficient or ensures convergence.

03

Numerical experiments validate the theoretical results.

Abstract

In this paper, we made an extension to the convergence analysis of the dynamics of two-layered bias-free networks with one $R e LU$ output. We took into consideration two popular regularization terms: the $ℓ_{1}$ and $ℓ_{2}$ norm of the parameter vector $w$ , and added it to the square loss function with coefficient $λ /2$ . We proved that when $λ$ is small, the weight vector $w$ converges to the optimal solution $\overset{w}{^}$ (with respect to the new loss function) with probability $\geq (1 - ε) (1 - A_{d}) /2$ under random initiations in a sphere centered at the origin, where $ε$ is a small value and $A_{d}$ is a constant. Numerical experiments including phase diagrams and repeated simulations verified our theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FengNiMa/ReLU_Convergence
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM