STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs

Weikang Meng; Liangyu Huo; Yadan Luo; Jiawen Guan; Jingyi Zhang; Yingjian Li; Zheng Zhang

arXiv:2602.02180·cs.LG·February 3, 2026

STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs

Weikang Meng, Liangyu Huo, Yadan Luo, Jiawen Guan, Jingyi Zhang, Yingjian Li, Zheng Zhang

PDF

Open Access

TL;DR

STILL introduces a novel intra-layer hybrid attention method for linearizing large language models, improving token selection accuracy and efficiency while maintaining performance on reasoning tasks and long-context benchmarks.

Contribution

The paper proposes STILL, a new intra-layer hybrid linearization framework with a self-saliency score and norm-preserved feature map, enhancing token selection and preserving pretrained representations.

Findings

01

Matches or surpasses original model on reasoning tasks

02

Achieves up to 86.2% improvement on long-context benchmarks

03

Efficient hardware implementation with chunk-wise parallelization

Abstract

Linearizing pretrained large language models (LLMs) primarily relies on intra-layer hybrid attention mechanisms to alleviate the quadratic complexity of standard softmax attention. Existing methods perform token routing based on sliding-window partitions, resulting in position-based selection and fails to capture token-specific global importance. Meanwhile, linear attention further suffers from distribution shift caused by learnable feature maps that distort pretrained feature magnitudes. Motivated by these limitations, we propose STILL, an intra-layer hybrid linearization framework for efficiently linearizing LLMs. STILL introduces a Self-Saliency Score with strong local-global consistency, enabling accurate token selection using sliding-window computation, and retains salient tokens for sparse softmax attention while summarizing the remaining context via linear attention. To preserve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Neural Network Applications