AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression
Atul Kumar Sinha, Fran\c{c}ois Fleuret

TL;DR
We propose AA-SVD, a novel low-rank factorization framework for fast, retraining-free compression of large language models that accounts for distribution shifts and refines transformer blocks end-to-end.
Contribution
Our method introduces a layer anchoring and distribution shift modeling approach that improves compression quality without retraining.
Findings
Outperforms existing SVD-based baselines across various compression ratios.
Maintains functional equivalence with the original model after compression.
Becomes more advantageous at higher compression levels, avoiding collapse.
Abstract
We introduce a fast low-rank factorization-based framework for compressing large language models that enables rapid compression of billion-parameter models without retraining. Unlike existing factorization-based approaches that optimize only on the original inputs, ignoring distribution shifts from upstream compression and thus propagating errors forward, or those that rely only on shifted inputs and risk drifting away from the original outputs, our approach accounts for both. Beyond individual layer compression, we further refine each transformer block end-to-end, minimizing block-level output distortion and allowing compressed layers to jointly compensate for accumulated errors. By anchoring each compressed layer to the original outputs while explicitly modeling input distribution shifts, our method finds a low-rank approximation that maintains functional equivalence with the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
