Convergence of a Relaxed Variable Splitting Method for Learning Sparse   Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Thu Dinh; Jack Xin

arXiv:1812.05719·math.OC·February 26, 2020·5 cites

Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Thu Dinh, Jack Xin

PDF

Open Access

TL;DR

This paper introduces a relaxed variable splitting method combining thresholding and gradient descent to efficiently learn sparse neural networks with various penalties, ensuring convergence to the true weights with high probability.

Contribution

It proposes a novel optimization approach that guarantees convergence and effective sparsity promotion for learning neural networks with $\, ext{l}_1$, $\, ext{l}_0$, and transformed-$ ext{l}_1$ penalties.

Findings

01

High-probability convergence to true weights under different penalties.

02

Numerical experiments validate theoretical convergence and sparsity results.

03

Trade-offs between accuracy and sparsity are demonstrated.

Abstract

Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the lack of non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under $ℓ_{1}, ℓ_{0}$ ; and transformed- $ℓ_{1}$ penalties, no-overlap networks can be learned with high probability, and the iterative weights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques