SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Xinhao Huang; You-Liang Huang; Zeyi Wen

arXiv:2604.03258·cs.CL·April 7, 2026

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Xinhao Huang, You-Liang Huang, Zeyi Wen

PDF

TL;DR

SoLA is a training-free LLM compression method that leverages soft activation sparsity and low-rank decomposition to significantly reduce model size while maintaining or improving performance.

Contribution

It introduces a novel, training-free compression technique combining activation sparsity and adaptive low-rank decomposition for large language models.

Findings

01

Achieves 30% compression on LLaMA-2-70B with improved perplexity and accuracy.

02

Outperforms state-of-the-art methods in language modeling and downstream tasks.

03

Effective across multiple models and benchmarks without additional training.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.