Quantization without Tears

Minghao Fu; Hao Yu; Jie Shao; Junjie Zhou; Ke Zhu; Jianxin Wu

arXiv:2411.13918·cs.CV·July 9, 2025

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu

PDF

Open Access 1 Repo

TL;DR

QwT introduces a simple, fast, and generalizable network quantization method that enhances accuracy with minimal hyperparameter tuning by adding a lightweight structure, suitable for diverse tasks.

Contribution

The paper presents QwT, a novel quantization approach that incorporates a small linear structure to improve accuracy and simplicity, enabling rapid and versatile model compression.

Findings

01

Effective across vision, language, and multimodal tasks

02

Achieves high accuracy with minimal hyperparameter tuning

03

Provides a closed-form solution for quick improvements

Abstract

Deep neural networks, while achieving remarkable success across diverse tasks, demand significant resources, including computation, GPU memory, bandwidth, storage, and energy. Network quantization, as a standard compression and acceleration technique, reduces storage costs and enables potential inference acceleration by discretizing network weights and activations into a finite set of integer values. However, current quantization methods are often complex and sensitive, requiring extensive task-specific hyperparameters, where even a single misconfiguration can impair model performance, limiting generality across different models and tasks. In this paper, we propose Quantization without Tears (QwT), a method that simultaneously achieves quantization speed, accuracy, simplicity, and generality. The key insight of QwT is to incorporate a lightweight additional structure into the quantized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wujx2001/QwT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques

MethodsSparse Evolutionary Training