ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

Jiaxi Li; Lu Yin; Li Shen; Jinjin Xu; Yuhui Liu; Wenwu Wang; Shiwei Liu; and Xilu Wang

arXiv:2605.03667·cs.LG·May 6, 2026

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

Jiaxi Li, Lu Yin, Li Shen, Jinjin Xu, Yuhui Liu, Wenwu Wang, Shiwei Liu, and Xilu Wang

PDF

1 Repo

TL;DR

ELAS introduces a novel low-rank training framework for large language models that leverages 2:4 activation sparsity to reduce memory usage and accelerate training without significant performance loss.

Contribution

ELAS applies 2:4 structured sparsity to activations in low-rank LLMs, enabling efficient pre-training with reduced memory and computational overhead.

Findings

01

ELAS maintains model performance with minimal degradation.

02

ELAS accelerates training and inference processes.

03

ELAS significantly reduces activation memory overhead.

Abstract

Large Language Models (LLMs) have achieved remarkable capabilities, but their immense computational demands during training remain a critical bottleneck for widespread adoption. Low-rank training has received attention in recent years due to its ability to significantly reduce training memory usage. Meanwhile, applying 2:4 structured sparsity to weights and activations to leverage NVIDIA GPU support for 2:4 structured sparse format has become a promising direction. However, existing low-rank methods often leave activation matrices in full-rank, which dominates memory consumption and limits throughput during large-batch training. Furthermore, directly applying sparsity to weights often leads to non-negligible performance degradation. To achieve efficient pre-training of LLMs, this paper proposes ELAS: Efficient pre-training of Low-rank LLMs via 2:4 Activation Sparsity, a novel framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ELASRepo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.