Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression

Hengyi Zhu; Zhendong Mi; Grace Li Zhang; Shaoyi Huang

arXiv:2605.08568·cs.LG·May 12, 2026

Different Prompts, Different Ranks: Prompt-aware Dynamic Rank Selection for SVD-based LLM Compression

Hengyi Zhu, Zhendong Mi, Grace Li Zhang, Shaoyi Huang

PDF

TL;DR

PARSE introduces a prompt-aware rank selection framework for SVD-based LLM compression, enhancing efficiency and accuracy by dynamically choosing ranks based on prompt semantics.

Contribution

It proposes a prompt-aware rank selection method with a linear router and shared patterns, improving SVD-based LLM compression performance and inference speed.

Findings

01

Up to 10% accuracy improvement at 0.6 compression ratio.

02

Achieves 2.5x prefill and 2.4x decoding speedup.

03

Effective across multiple SVD-based methods.

Abstract

Large language models (LLMs) have rapidly grown in scale, creating substantial memory and computational costs that hinder efficient deployment. Singular value decomposition (SVD) has emerged as an effective post-training compression technique, but existing SVD-based methods rely on static rank truncation, applying a fixed prefix of singular components to all inputs regardless of their diversity. We identify two limitations of this static design: the optimal rank varies across individual prompts, and the selected rank is sensitive to the choice of calibration set, leading to suboptimal performance across diverse inputs. To address these challenges, we propose $PARSE$ , a post-training framework for $P$ rompt- $A$ ware $R$ ank $S$ election as $E$ xperts in SVD-compressed LLMs. PARSE trains a linear router offline to perform prompt-aware rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.