1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models

Zeliang Zong; Kai Zhang; Zheyang Li; Wenming Tan; Ye Ren; Yiyan Zhai; Jilin Hu

arXiv:2510.26446·cs.CL·October 31, 2025

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models

Zeliang Zong, Kai Zhang, Zheyang Li, Wenming Tan, Ye Ren, Yiyan Zhai, Jilin Hu

PDF

1 Video

TL;DR

This paper introduces SSLC, a novel method combining sparse and low-rank techniques to compress large language models efficiently, achieving significant size reduction and speedup without performance loss.

Contribution

The paper presents a unified optimization framework for combining sparse and low-rank compression, demonstrating superior results over standalone methods on large language models.

Findings

01

Qwen2.5 compressed by 50% with no performance drop

02

Achieves at least 1.63× speedup

03

Outperforms existing standalone compression methods

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in language comprehension and generation; however, their widespread adoption is constrained by substantial bandwidth and computational demands. While pruning and low-rank approximation have each demonstrated promising performance individually, their synergy for LLMs remains underexplored. We introduce \underline{S}ynergistic \underline{S}parse and \underline{L}ow-Rank \underline{C}ompression (SSLC) methods for LLMs, which leverages the strengths of both techniques: low-rank approximation compresses the model by retaining its essential structure with minimal information loss, whereas sparse optimization eliminates non-essential weights, preserving those crucial for generalization. Based on theoretical analysis, we first formulate the low-rank approximation and sparse optimization as a unified problem and solve it by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models· underline