Channel-wise Mixed-precision Assignment for DNN Inference on Constrained   Edge Nodes

Matteo Risso; Alessio Burrello; Luca Benini; Enrico Macii; Massimo; Poncino; Daniele Jahier Pagliari

arXiv:2206.08852·cs.LG·January 26, 2023

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Matteo Risso, Alessio Burrello, Luca Benini, Enrico Macii, Massimo, Poncino, Daniele Jahier Pagliari

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural architecture search method that assigns mixed-precision bit-widths independently to each weight tensor channel, significantly improving efficiency on edge devices while maintaining accuracy.

Contribution

It proposes a new NAS approach that independently assigns bit-widths per channel, expanding the search space beyond layer-wise methods for better efficiency.

Findings

01

Achieved up to 63% reduction in memory usage.

02

Reduced energy consumption by up to 27%.

03

Produced Pareto-optimal models balancing accuracy and efficiency.

Abstract

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eml-eda/multi-prec-nas
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM