Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation

Ernie Chu; Vishal M. Patel

arXiv:2605.06892·cs.CV·May 11, 2026

Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation

Ernie Chu, Vishal M. Patel

PDF

1 Repo

TL;DR

Heterogeneous Step Allocation (HSA) is a training-free inference method for diffusion transformers that dynamically assigns denoising steps to tokens based on motion, significantly improving efficiency without quality loss.

Contribution

The paper introduces HSA, a novel inference algorithm that allocates steps heterogeneously to tokens, reducing computation in diffusion video generation without offline profiling.

Findings

01

HSA outperforms previous caching methods and baseline models at aggressive acceleration levels.

02

HSA maintains structural integrity and quality under tight computational budgets.

03

HSA achieves a better quality-runtime trade-off without additional offline profiling.

Abstract

Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known that human vision ignores vast amounts of redundant motion. Why, then, do our densest models treat every spatiotemporal token with equal priority? In this paper, we introduce Heterogeneous Step Allocation (HSA), a training-free inference algorithm that assigns varying step budgets to different spatiotemporal tokens based on their velocity dynamics. To resolve the resulting sequence-length mismatch without sacrificing global context, HSA introduces a KV-cache synchronization mechanism that allows active tokens to attend to the full sequence while entirely bypassing inactive tokens. Furthermore, we derive a cached Euler update that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ernestchu.github.io/hsa
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.