STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs
Weikang Meng, Liangyu Huo, Yadan Luo, Jiawen Guan, Jingyi Zhang, Yingjian Li, Zheng Zhang

TL;DR
STILL introduces a novel intra-layer hybrid attention method for linearizing large language models, improving token selection accuracy and efficiency while maintaining performance on reasoning tasks and long-context benchmarks.
Contribution
The paper proposes STILL, a new intra-layer hybrid linearization framework with a self-saliency score and norm-preserved feature map, enhancing token selection and preserving pretrained representations.
Findings
Matches or surpasses original model on reasoning tasks
Achieves up to 86.2% improvement on long-context benchmarks
Efficient hardware implementation with chunk-wise parallelization
Abstract
Linearizing pretrained large language models (LLMs) primarily relies on intra-layer hybrid attention mechanisms to alleviate the quadratic complexity of standard softmax attention. Existing methods perform token routing based on sliding-window partitions, resulting in position-based selection and fails to capture token-specific global importance. Meanwhile, linear attention further suffers from distribution shift caused by learnable feature maps that distort pretrained feature magnitudes. Motivated by these limitations, we propose STILL, an intra-layer hybrid linearization framework for efficiently linearizing LLMs. STILL introduces a Self-Saliency Score with strong local-global consistency, enabling accurate token selection using sliding-window computation, and retains salient tokens for sparse softmax attention while summarizing the remaining context via linear attention. To preserve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Neural Network Applications
