Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Yifan Zhou; Zeqi Xiao; Tianyi Wei; Shuai Yang; Xingang Pan

arXiv:2512.16615·cs.CV·December 19, 2025

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Yifan Zhou, Zeqi Xiao, Tianyi Wei, Shuai Yang, Xingang Pan

PDF

Open Access

TL;DR

This paper introduces Log-linear Sparse Attention (LLSA), a hierarchical, trainable sparse attention mechanism that significantly reduces computational costs in diffusion transformers handling long token sequences, enabling faster training and inference.

Contribution

The paper proposes LLSA, a novel hierarchical sparse attention method with a GPU-efficient implementation, improving scalability and efficiency of diffusion transformers for long sequences.

Findings

01

LLSA accelerates attention inference by 28.27x.

02

LLSA speeds up DiT training by 6.09x.

03

LLSA maintains high-quality image generation on high-resolution data.

Abstract

Diffusion Transformers (DiTs) set the state of the art in visual generation, yet their quadratic self-attention cost fundamentally limits scaling to long token sequences. Recent Top-K sparse attention approaches reduce the computation of DiTs by compressing tokens into block-wise representation and selecting a small set of relevant key blocks, but still suffer from (i) quadratic selection cost on compressed tokens and (ii) increasing K required to maintain model quality as sequences grow. We identify that their inefficiency is due to the single-level design, as a single coarse level is insufficient to represent the global structure. In this paper, we introduce Log-linear Sparse Attention (LLSA), a trainable sparse attention mechanism for extremely long token sequences that reduces both selection and attention costs from quadratic to log-linear complexity by utilizing a hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Memory and Neural Computing · Random lasers and scattering media