Lag-Relative Sparse Attention In Long Context Training

Manlai Liang; Wanyi Huang; Mandi Liu; Huaijun Li; Jinlong Li

arXiv:2506.11498·cs.CL·June 16, 2025

Lag-Relative Sparse Attention In Long Context Training

Manlai Liang, Wanyi Huang, Mandi Liu, Huaijun Li, Jinlong Li

PDF

Open Access

TL;DR

This paper introduces Lag-Relative Sparse Attention (LRSA), a novel method for long-context training that improves efficiency and robustness by selecting relevant historical key-value pairs without additional parameters.

Contribution

The paper proposes LRSA with LagKV compression, enabling effective long-context processing post-training without extra parameters or high computational costs.

Findings

01

Significantly improves robustness of LLMs with key-value compression.

02

Achieves better fine-tuned results in question-answer tasks.

03

Reduces computational and memory costs in long-context attention.

Abstract

Large Language Models (LLMs) have made significant strides in natural language processing and generation, yet their ability to handle long-context input remains constrained by the quadratic complexity of attention computation and linear-increasing key-value memory footprint. To reduce computational costs and memory, key-value cache compression techniques are commonly applied at inference time, but this often leads to severe performance degradation, as models are not trained to handle compressed context. Although there are more sophisticated compression methods, they are typically unsuitable for post-training because of their incompatibility with gradient-based optimization or high computation overhead. To fill this gap with no additional parameter and little computation overhead, we propose Lag-Relative Sparse Attention(LRSA) anchored by the LagKV compression method for long context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing

MethodsFocus