Nearly Lossless Adaptive Bit Switching

Haiduo Huang; Zhenhua Liu; Tian Xia; Wenzhe zhao; Pengju Ren

arXiv:2502.01199·cs.CV·February 4, 2025

Nearly Lossless Adaptive Bit Switching

Haiduo Huang, Zhenhua Liu, Tian Xia, Wenzhe zhao, Pengju Ren

PDF

Open Access 1 Repo

TL;DR

This paper introduces a nearly lossless adaptive bit switching method for neural network quantization, enabling efficient multi-precision training with minimal accuracy loss and reduced storage, applicable across various tasks.

Contribution

It proposes the Double Rounding quantization technique and an Adaptive Learning Rate Scaling method to improve one-shot joint training of multi-precision neural networks.

Findings

01

Achieves nearly lossless bit switching with reduced storage.

02

Outperforms state-of-the-art multi-precision quantization methods.

03

Validates effectiveness on classification, detection, segmentation, and LLM tasks.

Abstract

Model quantization is widely applied for compressing and accelerating deep neural networks (DNNs). However, conventional Quantization-Aware Training (QAT) focuses on training DNNs with uniform bit-width. The bit-width settings vary across different hardware and transmission demands, which induces considerable training and storage costs. Hence, the scheme of one-shot joint training multiple precisions is proposed to address this issue. Previous works either store a larger FP32 model to switch between different precision models for higher accuracy or store a smaller INT8 model but compromise accuracy due to using shared quantization parameters. In this paper, we introduce the Double Rounding quantization method, which fully utilizes the quantized representation range to accomplish nearly lossless bit-switching while reducing storage by using the highest integer precision instead of full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haiduo/Double-Rounding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Quantum Computing Algorithms and Architecture · Low-power high-performance VLSI design