Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression
Hengyi Zhu, Zhendong Mi, Grace Li Zhang, Shaoyi Huang

TL;DR
PARSE introduces a prompt-aware rank selection framework for SVD-based LLM compression, enhancing efficiency and accuracy by dynamically choosing ranks based on prompt semantics.
Contribution
It proposes a prompt-aware rank selection method with a linear router and shared patterns, improving SVD-based LLM compression performance and inference speed.
Findings
Up to 10% accuracy improvement at 0.6 compression ratio.
Achieves 2.5x prefill and 2.4x decoding speedup.
Effective across multiple SVD-based methods.
Abstract
Large language models (LLMs) have rapidly grown in scale, creating substantial memory and computational costs that hinder efficient deployment. Singular value decomposition (SVD) has emerged as an effective post-training compression technique, but existing SVD-based methods rely on static rank truncation, applying a fixed prefix of singular components to all inputs regardless of their diversity. We identify two limitations of this static design: the optimal rank varies across individual prompts, and the selected rank is sensitive to the choice of calibration set, leading to suboptimal performance across diverse inputs. To address these challenges, we propose , a post-training framework for rompt-ware ank election as xperts in SVD-compressed LLMs. PARSE trains a linear router offline to perform prompt-aware rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
