Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression
Ruoling Qi, Yirui Liu, Xuaner Wu, Xiangyu Wang, Ming Li, Chen Chen, Jian Chen, Yin Chen, Qizhen Weng

TL;DR
Swift-SVD is a novel low-rank compression method for LLMs that guarantees theoretical optimality, practical efficiency, and stability, significantly reducing memory and bandwidth costs.
Contribution
It introduces a closed-form, activation-aware compression framework with dynamic rank allocation, outperforming existing methods in accuracy and speed.
Findings
Achieves 3-70X speedups in compression time.
Outperforms state-of-the-art baselines in accuracy.
Works across six LLMs and eight datasets.
Abstract
The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
