Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers

Sucheng Ren; Qihang Yu; Ju He; Alan Yuille; Liang-Chieh Chen

arXiv:2505.14687·cs.CV·May 21, 2025

Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers

Sucheng Ren, Qihang Yu, Ju He, Alan Yuille, Liang-Chieh Chen

PDF

Open Access 1 Repo

TL;DR

GRAT is a training-free method that accelerates diffusion transformer attention by exploiting sparsity and grouping tokens, achieving over 35x speedup in large-scale image generation without quality loss.

Contribution

It introduces GRAT, a novel, training-free attention acceleration technique that leverages learned sparsity and grouping in pretrained diffusion transformers for faster image and video generation.

Findings

01

35.8x speedup in large image generation

02

Maintains output quality without fine-tuning

03

Effective on pretrained Flux and HunyuanVideo models

Abstract

Diffusion-based Transformers have demonstrated impressive generative capabilities, but their high computational costs hinder practical deployment, for example, generating an $8192 \times 8192$ image can take over an hour on an A100 GPU. In this work, we propose GRAT (\textbf{GR}ouping first, \textbf{AT}tending smartly), a training-free attention acceleration strategy for fast image and video generation without compromising output quality. The key insight is to exploit the inherent sparsity in learned attention maps (which tend to be locally focused) in pretrained Diffusion Transformers and leverage better GPU parallelism. Specifically, GRAT first partitions contiguous tokens into non-overlapping groups, aligning both with GPU execution patterns and the local attention structures learned in pretrained generative Transformers. It then accelerates attention by having all query tokens within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oliverrensu/grat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need · Diffusion · Sparse Evolutionary Training