PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs
Lukas Meiner, Jens Mehnert, Alexandru Paul Condurache

TL;DR
PROM introduces a novel quantization method for depthwise-separable CNNs, selectively using ternary and 8-bit weights to significantly reduce energy and storage costs while maintaining accuracy.
Contribution
It proposes a simple, effective quantization scheme that assigns different bit-widths to different parts of modern CNNs, optimizing efficiency.
Findings
Reduces energy cost by 23.9x on MobileNetV2
Decreases storage size by 2.7x compared to float16
Maintains similar classification accuracy on ImageNet
Abstract
Convolutional neural networks (CNNs) are crucial for computer vision tasks on resource-constrained devices. Quantization effectively compresses these models, reducing storage size and energy cost. However, in modern depthwise-separable architectures, the computational cost is distributed unevenly across its components, with pointwise operations being the most expensive. By applying a general quantization scheme to this imbalanced cost distribution, existing quantization approaches fail to fully exploit potential efficiency gains. To this end, we introduce PROM, a straightforward approach for quantizing modern depthwise-separable convolutional networks by selectively using two distinct bit-widths. Specifically, pointwise convolutions are quantized to ternary weights, while the remaining modules use 8-bit weights, which is achieved through a simple quantization-aware training procedure.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Generative Adversarial Networks and Image Synthesis
MethodsPointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · Average Pooling · Inverted Residual Block · Convolution · 1x1 Convolution
