CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning
Fanxu Meng, Pingzhi Tang, Fan jiang, Muhan Zhang

TL;DR
CLOVER introduces a novel low-rank decomposition approach using SVD on attention layers to enable efficient pruning and fine-tuning of large models without increasing parameters, significantly improving performance and pruning efficiency.
Contribution
The paper presents CLOVER, a new method applying SVD to attention layers for effective pruning and fine-tuning, outperforming existing techniques across multiple models and tasks.
Findings
CLOVER achieves similar perplexity with 70% pruning as vanilla methods do with 8%.
Fine-tuning singular values enhances model performance beyond state-of-the-art methods.
CLOVER improves pruning efficiency and model adaptability across various large models.
Abstract
Decoder-only models generate tokens autoregressively by caching key/value vectors, but as the cache grows, inference becomes memory-bound. To address this issue, we introduce CLOVER (Cross-Layer Orthogonal Vectors), a novel approach that treats pairs of attention layers as a set of low-rank decompositions. CLOVER applies Singular Value Decomposition (SVD) to the \( Q \)-\( K \) and \( V \)-\( O \) pairs within each attention head. The resulting singular values can either guide pruning or serve as trainable parameters for efficient fine-tuning of all orthogonal vectors. After pruning or fine-tuning, these values are reintegrated into the model without increasing its parameter count. We apply CLOVER to various models, including GPT-2 XL, DeepSeek-V2-Lite, Whisper-Large-v3, Stable Diffusion XL, and LLaMA-3.2-11B-Vision. Our results demonstrate that CLOVER significantly improves pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Rough Sets and Fuzzy Logic
MethodsAttention Is All You Need · Cosine Annealing · Adam · Softmax · Dropout · Linear Warmup With Cosine Annealing · Attention Dropout · Linear Layer · Byte Pair Encoding · Dense Connections
