HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix   Decomposition for LLMs

Mehdi Makni; Kayhan Behdin; Zheng Xu; Natalia Ponomareva; Rahul; Mazumder

arXiv:2502.00899·stat.ML·February 4, 2025

HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

Mehdi Makni, Kayhan Behdin, Zheng Xu, Natalia Ponomareva, Rahul, Mazumder

PDF

Open Access

TL;DR

HASSLE-free is a unified framework for decomposing large language model weights into sparse and low-rank matrices, enabling more efficient compression and inference acceleration while maintaining performance.

Contribution

It introduces a novel optimization framework for sparse plus low-rank decomposition, outperforming prior methods in efficiency and accuracy for large language models.

Findings

01

Reduces test perplexity by 12% on WikiText-2

02

Decreases zero-shot task gap by 15%

03

Outperforms state-of-the-art in decomposition quality

Abstract

The impressive capabilities of large foundation models come at a cost of substantial computing resources to serve them. Compressing these pre-trained models is of practical interest as it can democratize deploying them to the machine learning community at large by lowering the costs associated with inference. A promising compression scheme is to decompose foundation models' dense weights into a sum of sparse plus low-rank matrices. In this paper, we design a unified framework coined HASSLE-free for (semi-structured) sparse plus low-rank matrix decomposition of foundation models. Our framework introduces the local layer-wise reconstruction error objective for this decomposition, we demonstrate that prior work solves a relaxation of this optimization problem; and we provide efficient and scalable methods to minimize the exact introduced optimization problem. HASSLE-free substantially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Algorithms and Data Compression