Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization

R\'ois\'in Luo; Alexandru Drimbarean; James McDermott; Colm O'Riordan

arXiv:2408.00923·cs.CV·April 28, 2026

Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization

R\'ois\'in Luo, Alexandru Drimbarean, James McDermott, Colm O'Riordan

PDF

TL;DR

This paper introduces CoRa, a novel low-bit quantization framework that reclaims residual knowledge via architecture search of low-rank adapters, achieving high performance with minimal iterations.

Contribution

CoRa uniquely models quantization residuals as low-rank adapters and searches for their optimal architecture, significantly reducing search space and computational cost.

Findings

01

CoRa achieves comparable accuracy to state-of-the-art methods in 3- and 4-bit quantization.

02

It requires less than 250 iterations on a small calibration set.

03

CoRa establishes new efficiency benchmarks in low-bit quantization.

Abstract

This paper explores a novel paradigm in low-bit (i.e. 4-bits or lower) quantization, differing from existing state-of-the-art methods, by framing optimal quantization as an architecture search problem within convolutional neural networks (ConvNets). Our framework, dubbed \textbf{CoRa} (Optimal Quantization Residual \textbf{Co}nvolutional Operator Low-\textbf{Ra}nk Adaptation), is motivated by two key aspects. Firstly, quantization residual knowledge, i.e. the lost information between floating-point weights and quantized weights, has long been neglected by the research community. Reclaiming the critical residual knowledge, with an infinitesimal extra parameter cost, can reverse performance degradation without training. Secondly, state-of-the-art quantization frameworks search for optimal quantized weights to address the performance degradation. Yet, the vast search spaces in weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.