Ternary MobileNets via Per-Layer Hybrid Filter Banks
Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina

TL;DR
This paper introduces a novel per-layer hybrid filter bank quantization method for MobileNets, enabling significant energy and size reductions while maintaining accuracy and throughput on specialized hardware.
Contribution
It proposes a new hybrid filter bank quantization approach that combines full-precision and ternary filters for MobileNets, improving efficiency without accuracy loss.
Findings
Achieved 27.98% energy savings.
Reduced model size by 51.07%.
Maintained comparable accuracy and throughput.
Abstract
MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly constrained devices require further compression of MobileNets-like already compute-efficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4-6 bits albeit with a modest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision less than or equal to 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
