Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Zecheng Tang; Quantong Qiu; Yi Yang; Zhiyi Hong; Haiya Xiang; Kebin Liu; Qingqing Dang; Juntao Li; Min Zhang

arXiv:2601.17367·cs.CL·January 29, 2026

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Zecheng Tang, Quantong Qiu, Yi Yang, Zhiyi Hong, Haiya Xiang, Kebin Liu, Qingqing Dang, Juntao Li, Min Zhang

PDF

Open Access 10 Models

TL;DR

Elastic Attention introduces a dynamic, input-dependent sparsity mechanism for transformers, significantly improving efficiency and performance in long-context scenarios by adapting attention modes during inference.

Contribution

We propose Elastic Attention, a novel method that dynamically adjusts attention sparsity ratios at test time using an Attention Router, enhancing scalability and adaptability of large language models.

Findings

01

Achieves strong performance with efficient inference on long-context benchmarks.

02

Enables dynamic adjustment of attention modes during inference.

03

Requires only 12 hours of training on 8xA800 GPUs.

Abstract

The quadratic complexity of standard attention mechanisms poses a significant scalability bottleneck for large language models (LLMs) in long-context scenarios. While hybrid attention strategies that combine sparse and full attention within a single model offer a viable solution, they typically employ static computation ratios (i.e., fixed proportions of sparse versus full attention) and fail to adapt to the varying sparsity sensitivities of downstream tasks during inference. To address this issue, we propose Elastic Attention, which allows the model to dynamically adjust its overall sparsity based on the input. This is achieved by integrating a lightweight Attention Router into the existing pretrained model, which dynamically assigns each attention head to different computation modes. Within only 12 hours of training on 8xA800 GPUs, our method enables models to achieve both strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning