Hierarchical Sparse Plus Low Rank Compression of LLM
Pawan Kumar, Aditi Gupta

TL;DR
This paper introduces Hierarchical Sparse Plus Low-Rank (HSS) compression for large language models, combining sparsity and low-rank factorization to reduce memory and computation while maintaining performance.
Contribution
The paper proposes a novel two-stage HSS compression method with recursive hierarchy and RCM permutation, improving efficiency and compressibility of LLMs.
Findings
HSS achieves significant memory savings on LLaMA-7B.
HSS maintains state-of-the-art perplexity scores.
HSS outperforms classical sparse-plus-SVD methods.
Abstract
Modern large language models (LLMs) place extraordinary pressure on memory and compute budgets, making principled compression indispensable for both deployment and continued training. We present Hierarchical Sparse Plus Low-Rank (HSS) compression, a two-stage scheme that (i) removes the largest-magnitude weights into a sparse matrix S and (ii) applies a recursive Hierarchically Sparse Separable (HSS) low-rank factorisation to the dense residual matrix. A recursive rank-reducing strategy and a reverse Cuthill-Mckee (RCM) permutation are introduced to align high weights towards the diagonal with the block-diagonal hierarchy, maximising off-diagonal compressibility (because they are touched only once). HSS is hardware-friendly: its matrix-vector multiply reduces to one sparse and a sequence of thin-matrix multiplications and can be trained end-to-end with standard optimisers. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Topic Modeling
