Lag-Relative Sparse Attention In Long Context Training
Manlai Liang, Wanyi Huang, Mandi Liu, Huaijun Li, Jinlong Li

TL;DR
This paper introduces Lag-Relative Sparse Attention (LRSA), a novel method for long-context training that improves efficiency and robustness by selecting relevant historical key-value pairs without additional parameters.
Contribution
The paper proposes LRSA with LagKV compression, enabling effective long-context processing post-training without extra parameters or high computational costs.
Findings
Significantly improves robustness of LLMs with key-value compression.
Achieves better fine-tuned results in question-answer tasks.
Reduces computational and memory costs in long-context attention.
Abstract
Large Language Models (LLMs) have made significant strides in natural language processing and generation, yet their ability to handle long-context input remains constrained by the quadratic complexity of attention computation and linear-increasing key-value memory footprint. To reduce computational costs and memory, key-value cache compression techniques are commonly applied at inference time, but this often leads to severe performance degradation, as models are not trained to handle compressed context. Although there are more sophisticated compression methods, they are typically unsuitable for post-training because of their incompatibility with gradient-based optimization or high computation overhead. To fill this gap with no additional parameter and little computation overhead, we propose Lag-Relative Sparse Attention(LRSA) anchored by the LagKV compression method for long context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing
MethodsFocus
