# Convergence of a Relaxed Variable Splitting Coarse Gradient Descent   Method for Learning Sparse Weight Binarized Activation Neural Networks

**Authors:** Thu Dinh, Jack Xin

arXiv: 1901.09731 · 2019-02-12

## TL;DR

This paper introduces a new relaxed variable splitting method combining thresholding and coarse gradient descent to efficiently learn sparse, binarized activation CNNs, with proven convergence and explicit error bounds.

## Contribution

It develops a novel convergence framework for sparse binarized CNNs using thresholding and coarse gradient descent, with theoretical guarantees and explicit error estimates.

## Key findings

- Proves convergence of the proposed method to a global limit.
- Establishes high-probability learnability of sparse binarized CNNs.
- Provides explicit error bounds for the learned weights.

## Abstract

Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. Binarized activation offers an additional computational saving for inference. Due to vanishing gradient issue in training networks with binarized activation, coarse gradient (a.k.a. straight through estimator) is adopted in practice. In this paper, we study the problem of coarse gradient descent (CGD) learning of a one hidden layer convolutional neural network (CNN) with binarized activation function and sparse weights. It is known that when the input data is Gaussian distributed, no-overlap one hidden layer CNN with ReLU activation and general weight can be learned by GD in polynomial time at high probability in regression problems with ground truth. We propose a relaxed variable splitting method integrating thresholding and coarse gradient descent. The sparsity in network weight is realized through thresholding during the CGD training process. We prove that under threshholding of $\ell_1, \ell_0,$ and transformed-$\ell_1$ penalties, no-overlap binary activation CNN can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel sparsifying operation. We found explicit error estimates of sparse weights from the true weights.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.09731/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1901.09731/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1901.09731/full.md

---
Source: https://tomesphere.com/paper/1901.09731