Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
R\'ois\'in Luo, Alexandru Drimbarean, James McDermott, Colm O'Riordan

TL;DR
This paper introduces CoRa, a novel low-bit quantization framework that reclaims residual knowledge via architecture search of low-rank adapters, achieving high performance with minimal iterations.
Contribution
CoRa uniquely models quantization residuals as low-rank adapters and searches for their optimal architecture, significantly reducing search space and computational cost.
Findings
CoRa achieves comparable accuracy to state-of-the-art methods in 3- and 4-bit quantization.
It requires less than 250 iterations on a small calibration set.
CoRa establishes new efficiency benchmarks in low-bit quantization.
Abstract
This paper explores a novel paradigm in low-bit (i.e. 4-bits or lower) quantization, differing from existing state-of-the-art methods, by framing optimal quantization as an architecture search problem within convolutional neural networks (ConvNets). Our framework, dubbed \textbf{CoRa} (Optimal Quantization Residual \textbf{Co}nvolutional Operator Low-\textbf{Ra}nk Adaptation), is motivated by two key aspects. Firstly, quantization residual knowledge, i.e. the lost information between floating-point weights and quantized weights, has long been neglected by the research community. Reclaiming the critical residual knowledge, with an infinitesimal extra parameter cost, can reverse performance degradation without training. Secondly, state-of-the-art quantization frameworks search for optimal quantized weights to address the performance degradation. Yet, the vast search spaces in weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
