Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu, Tian Tang, Qinyu Xu, Yile Gu, Zhichen Zeng, Rohan Kadekodi,, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci

TL;DR
Tactic introduces an adaptive sparse attention mechanism for long-context LLMs that dynamically selects tokens based on attention importance, improving efficiency and accuracy over fixed-budget methods.
Contribution
It proposes a novel, calibration-free sparse attention method using clustering and distribution fitting to adaptively select tokens based on attention scores.
Findings
Achieves up to 7.29x speedup in decode attention
Outperforms existing sparse attention algorithms in accuracy
Provides a 1.58x overall inference speedup
Abstract
Long-context models are essential for many applications but face inefficiencies in loading large KV caches during decoding. Prior methods enforce fixed token budgets for sparse attention, assuming a set number of tokens can approximate full attention. However, these methods overlook variations in the importance of attention across heads, layers, and contexts. To address these limitations, we propose Tactic, a sparsity-adaptive and calibration-free sparse attention mechanism that dynamically selects tokens based on their cumulative attention scores rather than a fixed token budget. By setting a target fraction of total attention scores, Tactic ensures that token selection naturally adapts to variations in attention sparsity. To efficiently approximate this selection, Tactic leverages clustering-based sorting and distribution fitting, allowing it to accurately estimate token importance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Data Mining Algorithms and Applications
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
