Training Sparse Neural Network by Constraining Synaptic Weight on Unit Lp Sphere
Weipeng Li, Xiaogang Yang, Chuanxiang Li, Ruitao Lu, Xueli Xie

TL;DR
This paper introduces a novel method for training sparse neural networks by constraining weights on a unit Lp-sphere, using a new gradient descent algorithm, with theoretical guarantees and practical topology evolution techniques.
Contribution
The paper proposes Lp-spherical gradient descent for constrained optimization, analyzes the effect of p on sparsity, and introduces semi-pruning for effective topology evolution.
Findings
LpSGD converges theoretically and empirically.
The expected sparsity can be predicted under gamma distribution assumptions.
Experimental results validate the effectiveness across benchmark datasets.
Abstract
Sparse deep neural networks have shown their advantages over dense models with fewer parameters and higher computational efficiency. Here we demonstrate constraining the synaptic weights on unit Lp-sphere enables the flexibly control of the sparsity with p and improves the generalization ability of neural networks. Firstly, to optimize the synaptic weights constrained on unit Lp-sphere, the parameter optimization algorithm, Lp-spherical gradient descent (LpSGD) is derived from the augmented Empirical Risk Minimization condition, which is theoretically proved to be convergent. To understand the mechanism of how p affects Hoyer's sparsity, the expectation of Hoyer's sparsity under the hypothesis of gamma distribution is given and the predictions are verified at various p under different conditions. In addition, the "semi-pruning" and threshold adaptation are designed for topology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Memory and Neural Computing · Domain Adaptation and Few-Shot Learning
