Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers
Sucheng Ren, Qihang Yu, Ju He, Alan Yuille, Liang-Chieh Chen

TL;DR
GRAT is a training-free method that accelerates diffusion transformer attention by exploiting sparsity and grouping tokens, achieving over 35x speedup in large-scale image generation without quality loss.
Contribution
It introduces GRAT, a novel, training-free attention acceleration technique that leverages learned sparsity and grouping in pretrained diffusion transformers for faster image and video generation.
Findings
35.8x speedup in large image generation
Maintains output quality without fine-tuning
Effective on pretrained Flux and HunyuanVideo models
Abstract
Diffusion-based Transformers have demonstrated impressive generative capabilities, but their high computational costs hinder practical deployment, for example, generating an image can take over an hour on an A100 GPU. In this work, we propose GRAT (\textbf{GR}ouping first, \textbf{AT}tending smartly), a training-free attention acceleration strategy for fast image and video generation without compromising output quality. The key insight is to exploit the inherent sparsity in learned attention maps (which tend to be locally focused) in pretrained Diffusion Transformers and leverage better GPU parallelism. Specifically, GRAT first partitions contiguous tokens into non-overlapping groups, aligning both with GPU execution patterns and the local attention structures learned in pretrained generative Transformers. It then accelerates attention by having all query tokens within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · Diffusion · Sparse Evolutionary Training
