SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing
Yijiong Yu, Jiale Liu, Qingyun Wu, Huazheng Wang, Ji Pei

TL;DR
SWAA introduces a set of adaptable techniques to enable efficient long context processing in Transformer models by combining sliding window attention with strategies to mitigate structural and training mismatches, achieving significant speedups with maintained quality.
Contribution
The paper presents SWAA, a versatile toolkit that adapts full attention models to sliding window attention without extensive pretraining, improving long context inference efficiency.
Findings
Achieves 30% to 100% speedups in long context inference.
Effectively recovers long context performance with specific strategy combinations.
Provides a flexible framework adaptable to various computational scenarios.
Abstract
The quadratic complexity of self attention in Transformer based LLMs renders long context inference prohibitively expensive. While Sliding Window Attention (SWA), the simplest sparse attention pattern, offers a linear complexity alternative, it suffers from catastrophic long context performance collapse, which stems from two fundamental factors: the training inference mismatch when naively applying SWA to models pretrained with Full Attention (FA), and the inherent structural inability to access distant information when applying SWA to every module at all times. To address these dual challenges, we propose Sliding Window Attention Adaptation (SWAA), a plug and play toolkit of recipes that adapts FA models to SWA without costly pretraining. SWAA systematically combines four core strategies to tackle these distinct issues: (1) Full Attention (FA) Decode and (2) Interleaving FA and SWA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis
