3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs
Mehdi Makni, Xiang Meng, Rahul Mazumder

TL;DR
This paper introduces 3BASiL-TM, a novel one-shot post-training framework for decomposing large language models into sparse and low-rank components, significantly improving compression quality and efficiency.
Contribution
The paper proposes a new 3-Block ADMM method and a transformer-matching refinement for effective sparse plus low-rank decomposition of LLMs, with convergence guarantees and broad applicability.
Findings
Reduces WikiText2 perplexity gap by over 30%.
Achieves 2.5x faster compression runtime on GPU.
Outperforms prior methods in model compression quality.
Abstract
Sparse plus Low-Rank decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices . Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce 3BASiL-TM, an efficient one-shot post-training method for decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed 3BASiL, to minimize the layer-wise reconstruction error with convergence guarantees. We then design an efficient transformer-matching (TM) refinement step that jointly optimizes the sparse and low-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Generative Adversarial Networks and Image Synthesis
