DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Akash Haridas; Utkarsh Saxena; Parsa Ashrafi Fashi; Mehdi Rezagholizadeh; Vikram Appia; Emad Barsoum

arXiv:2603.06351·cs.CV·May 8, 2026

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Akash Haridas, Utkarsh Saxena, Parsa Ashrafi Fashi, Mehdi Rezagholizadeh, Vikram Appia, Emad Barsoum

PDF

TL;DR

DC-DiT introduces a learned, adaptive tokenization mechanism for diffusion transformers, enabling efficient, flexible image generation with dynamic compute allocation and improved quality-compute tradeoffs.

Contribution

The paper presents a novel adaptive chunking approach that replaces static patchification, allowing for importance-based token compression and elastic inference in diffusion models.

Findings

01

Reduces inference FLOPs by up to 36.8% on ImageNet.

02

Improves FID scores by up to 37.8% over baseline models.

03

Enables flexible inference with a smooth quality-compute tradeoff.

Abstract

Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth backgrounds, detailed object regions, noisy early timesteps, and late-stage refinements. We introduce the Dynamic Chunking Diffusion Transformer (DC-DiT), which replaces fixed patchification with a learned encoder-router-decoder scaffold that adaptively compresses the 2D input into a shorter token sequence through a chunking mechanism learned end-to-end with diffusion training. DC-DiT allocates fewer tokens to predictable regions and noisy timesteps, and more tokens to detailed regions and later refinement stages, yielding meaningful spatial segmentations and timestep-adaptive compression schedules without supervision. Furthermore, the router provides an importance ordering over retained tokens, enabling elastic inference: a single checkpoint can be evaluated at flexible compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.