$S^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks
Xinlin Li, Bang Liu, Yaoliang Yu, Wulong Liu, Chunjing Xu, Vahid, Partovi Nia

TL;DR
This paper introduces S^3, a reparameterization technique for low-bit shift neural networks that improves training stability and performance, enabling 3-bit shift networks to outperform full-precision models on ImageNet.
Contribution
The paper proposes a novel sign-sparse-shift reparameterization method that enhances the training of low-bit shift networks, addressing initialization sensitivity and vanishing gradient issues.
Findings
3-bit shift networks outperform full-precision models on ImageNet.
The method reduces sensitivity to weight initialization.
Efficient learning of low-bit networks with dynamics similar to full-precision ones.
Abstract
Shift neural networks reduce computation complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values, which are fast and energy efficient compared to conventional neural networks. However, existing shift networks are sensitive to the weight initialization, and also yield a degraded performance caused by vanishing gradient and weight sign freezing problem. To address these issues, we propose S low-bit re-parameterization, a novel technique for training low-bit shift networks. Our method decomposes a discrete parameter in a sign-sparse-shift 3-fold manner. In this way, it efficiently learns a low-bit network with a weight dynamics similar to full-precision networks and insensitive to weight initialization. Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
