AdaQAT: Adaptive Bit-Width Quantization-Aware Training

C\'edric Gernigon (TARAN); Silviu-Ioan Filip (TARAN); Olivier Sentieys; (TARAN); Cl\'ement Coggiola (CNES); Mickael Bruno (CNES)

arXiv:2404.16876·cs.LG·April 29, 2024

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

C\'edric Gernigon (TARAN), Silviu-Ioan Filip (TARAN), Olivier Sentieys, (TARAN), Cl\'ement Coggiola (CNES), Mickael Bruno (CNES)

PDF

TL;DR

AdaQAT introduces a learning-based, adaptive method for optimizing bit-widths in quantization-aware training, enabling efficient mixed-precision DNN inference with minimal manual tuning.

Contribution

It proposes a novel gradient-based approach to automatically optimize weight and activation bit-widths during training, applicable from scratch or fine-tuning.

Findings

01

Competitive accuracy on CIFAR-10 and ImageNet datasets.

02

Effective in both training from scratch and fine-tuning scenarios.

03

Flexible for mixed-precision uniform quantization.

Abstract

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network