Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
Yifan Zhou, Zeqi Xiao, Tianyi Wei, Shuai Yang, Xingang Pan

TL;DR
This paper introduces Log-linear Sparse Attention (LLSA), a hierarchical, trainable sparse attention mechanism that significantly reduces computational costs in diffusion transformers handling long token sequences, enabling faster training and inference.
Contribution
The paper proposes LLSA, a novel hierarchical sparse attention method with a GPU-efficient implementation, improving scalability and efficiency of diffusion transformers for long sequences.
Findings
LLSA accelerates attention inference by 28.27x.
LLSA speeds up DiT training by 6.09x.
LLSA maintains high-quality image generation on high-resolution data.
Abstract
Diffusion Transformers (DiTs) set the state of the art in visual generation, yet their quadratic self-attention cost fundamentally limits scaling to long token sequences. Recent Top-K sparse attention approaches reduce the computation of DiTs by compressing tokens into block-wise representation and selecting a small set of relevant key blocks, but still suffer from (i) quadratic selection cost on compressed tokens and (ii) increasing K required to maintain model quality as sequences grow. We identify that their inefficiency is due to the single-level design, as a single coarse level is insufficient to represent the global structure. In this paper, we introduce Log-linear Sparse Attention (LLSA), a trainable sparse attention mechanism for extremely long token sequences that reduces both selection and attention costs from quadratic to log-linear complexity by utilizing a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Memory and Neural Computing · Random lasers and scattering media
