Ternary MobileNets via Per-Layer Hybrid Filter Banks

Dibakar Gope; Jesse Beu; Urmish Thakker; Matthew Mattina

arXiv:1911.01028·cs.LG·November 5, 2019

Ternary MobileNets via Per-Layer Hybrid Filter Banks

Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina

PDF

TL;DR

This paper introduces a novel per-layer hybrid filter bank quantization method for MobileNets, enabling significant energy and size reductions while maintaining accuracy and throughput on specialized hardware.

Contribution

It proposes a new hybrid filter bank quantization approach that combines full-precision and ternary filters for MobileNets, improving efficiency without accuracy loss.

Findings

01

Achieved 27.98% energy savings.

02

Reduced model size by 51.07%.

03

Maintained comparable accuracy and throughput.

Abstract

MobileNets family of computer vision neural networks have fueled tremendous progress in the design and organization of resource-efficient architectures in recent years. New applications with stringent real-time requirements on highly constrained devices require further compression of MobileNets-like already compute-efficient networks. Model quantization is a widely used technique to compress and accelerate neural network inference and prior works have quantized MobileNets to 4-6 bits albeit with a modest to significant drop in accuracy. While quantization to sub-byte values (i.e. precision less than or equal to 8 bits) has been valuable, even further quantization of MobileNets to binary or ternary values is necessary to realize significant energy savings and possibly runtime speedups on specialized hardware, such as ASICs and FPGAs. Under the key observation that convolutional filters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.