LoopQ: Quantization for Recursive Transformers
Rui Fang, Hsi-Wen Chen, Ming-Syan Chen

TL;DR
LoopQ is a novel quantization framework designed for recursive Transformer-based language models, addressing distribution shift and error accumulation to improve accuracy and perplexity under low-bit quantization.
Contribution
This paper introduces LoopQ, the first loop-aware post-training quantization method that effectively preserves model accuracy in recursive Transformers.
Findings
LoopQ improves downstream accuracy by 68.8% under W4A4 quantization.
LoopQ reduces average perplexity by 87.7% compared to static PTQ.
Experiments across seven benchmarks validate LoopQ's effectiveness.
Abstract
Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer blocks, enabling deeper computation under a fixed model size. However, this reuse makes LoopLMs more fragile under post-training quantization (PTQ). We present the first systematic study of quantization in LoopLMs and identify three challenges: distribution shift across roles, state reuse across loop transitions, and recursive error accumulation. To address these challenges, we propose LoopQ, a loop-aware PTQ framework that preserves a shared quantized backbone while introducing lightweight adaptations. LoopQ combines activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization to reduce distributional mismatch within loops and error accumulation across loops. Experiments across seven benchmarks show that, under W4A4 quantization, LoopQ improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
