SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Jintao Zhang; Haoxu Wang; Kai Jiang; Kaiwen Zheng; Youhe Jiang; Ion Stoica; Jianfei Chen; Jun Zhu; Joseph E. Gonzalez

arXiv:2602.12675·cs.LG·February 16, 2026

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Jintao Zhang, Haoxu Wang, Kai Jiang, Kaiwen Zheng, Youhe Jiang, Ion Stoica, Jianfei Chen, Jun Zhu, Joseph E. Gonzalez

PDF

Open Access

TL;DR

SLA2 enhances sparse-linear attention in diffusion models by introducing a learnable routing, a more accurate attention formulation, and quantization, achieving high sparsity and significant speedup without quality loss.

Contribution

SLA2 proposes a learnable routing and a faithful sparse-linear attention formulation, improving efficiency and accuracy in diffusion model attention mechanisms.

Findings

01

Achieves 97% attention sparsity in video diffusion models.

02

Provides 18.6x attention speedup while maintaining quality.

03

Introduces quantization-aware fine-tuning for low-bit attention.

Abstract

Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identify a mismatch between SLA and a direct decomposition into sparse and linear attention. We propose SLA2, which introduces (I) a learnable router that dynamically selects whether each attention computation should use sparse or linear attention, (II) a more faithful and direct sparse-linear attention formulation that uses a learnable ratio to combine the sparse and linear attention branches, and (III) a sparse + low-bit attention design, where low-bit attention is introduced via quantization-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis