La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

Kai Liu; Bowen Xu; Shaoyu Wu; Xin Chen; Hao Zhou; Yongliang Tao; Lulu Hu

arXiv:2507.01299·cs.CL·January 6, 2026

La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation

Kai Liu, Bowen Xu, Shaoyu Wu, Xin Chen, Hao Zhou, Yongliang Tao, Lulu Hu

PDF

Open Access

TL;DR

LaRoSA introduces a layerwise orthogonal rotation technique to sparsify activations in LLMs, enabling efficient inference acceleration with minimal performance loss without additional training or pruning.

Contribution

The paper proposes LaRoSA, a novel activation sparsification method using layerwise rotations that achieves consistent sparsity and speed-up in LLMs without extra training.

Findings

01

Achieves 1.30x wall-clock speed-up at 40% sparsity for LLaMA2-7B.

02

Maintains 0.17 perplexity gap with dense models.

03

Reduces zero-shot task accuracy gap to 0.54%.

Abstract

Activation sparsity can reduce the computational overhead and memory transfers during the forward pass of Large Language Model (LLM) inference. Existing methods face limitations, either demanding time-consuming recovery training that hinders real-world adoption, or relying on empirical magnitude-based pruning, which causes fluctuating sparsity and unstable inference speed-up. This paper introduces LaRoSA (Layerwise Rotated Sparse Activation), a novel method for activation sparsification designed to improve LLM efficiency without requiring additional training or magnitude-based pruning. We leverage layerwise orthogonal rotations to transform input activations into rotated forms that are more suitable for sparsification. By employing a Top-K selection approach within the rotated activations, we achieve consistent model-level sparsity and reliable wall-clock time speed-up. LaRoSA is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods