AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression

Atul Kumar Sinha; Fran\c{c}ois Fleuret

arXiv:2604.02119·cs.LG·April 3, 2026

AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression

Atul Kumar Sinha, Fran\c{c}ois Fleuret

PDF

TL;DR

We propose AA-SVD, a novel low-rank factorization framework for fast, retraining-free compression of large language models that accounts for distribution shifts and refines transformer blocks end-to-end.

Contribution

Our method introduces a layer anchoring and distribution shift modeling approach that improves compression quality without retraining.

Findings

01

Outperforms existing SVD-based baselines across various compression ratios.

02

Maintains functional equivalence with the original model after compression.

03

Becomes more advantageous at higher compression levels, avoiding collapse.

Abstract

We introduce a fast low-rank factorization-based framework for compressing large language models that enables rapid compression of billion-parameter models without retraining. Unlike existing factorization-based approaches that optimize only on the original inputs, ignoring distribution shifts from upstream compression and thus propagating errors forward, or those that rely only on shifted inputs and risk drifting away from the original outputs, our approach accounts for both. Beyond individual layer compression, we further refine each transformer block end-to-end, minimizing block-level output distortion and allowing compressed layers to jointly compensate for accumulated errors. By anchoring each compressed layer to the original outputs while explicitly modeling input distribution shifts, our method finds a low-rank approximation that maintains functional equivalence with the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.