Bayesian Bits: Unifying Quantization and Pruning
Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali, Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

TL;DR
Bayesian Bits presents a unified, gradient-based approach for joint quantization and pruning of neural networks, optimizing bit widths and sparsity to improve accuracy-efficiency trade-offs.
Contribution
The paper introduces Bayesian Bits, a novel method that combines quantization and pruning through a decomposition of the quantization operation and learnable stochastic gates.
Findings
Achieves better accuracy-efficiency trade-offs than static bit width models.
Produces hardware-friendly configurations with power-of-two bit widths.
Validates effectiveness on multiple benchmark datasets.
Abstract
We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsPruning
