Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for   Efficient Diffusion Transformers

Haoran You; Connelly Barnes; Yuqian Zhou; Yan Kang; Zhenbang Du; Wei; Zhou; Lingzhi Zhang; Yotam Nitzan; Xiaoyang Liu; Zhe Lin; Eli Shechtman,; Sohrab Amirghodsi; Yingyan Celine Lin

arXiv:2412.16822·cs.CV·March 28, 2025

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers

Haoran You, Connelly Barnes, Yuqian Zhou, Yan Kang, Zhenbang Du, Wei, Zhou, Lingzhi Zhang, Yotam Nitzan, Xiaoyang Liu, Zhe Lin, Eli Shechtman,, Sohrab Amirghodsi, Yingyan Celine Lin

PDF

Open Access

TL;DR

DiffCR introduces a dynamic, differentiable token compression framework for diffusion transformers, optimizing computation across tokens, layers, and timesteps to enhance efficiency without sacrificing image generation quality.

Contribution

It proposes a novel adaptive inference method with differentiable compression ratios, enabling dynamic routing and compression in diffusion transformers for improved efficiency.

Findings

01

Achieves better quality-efficiency trade-offs than prior methods.

02

Effectively adapts compression ratios across tokens, layers, and timesteps.

03

Demonstrates superior performance on text-to-image and inpainting tasks.

Abstract

Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One major efficiency bottleneck is that existing DiTs apply equal computation across all regions of an image. However, not all image tokens are equally important, and certain localized areas require more computation, such as objects. To address this, we propose DiffCR, a dynamic DiT inference framework with differentiable compression ratios, which automatically learns to dynamically route computation across layers and timesteps for each image token, resulting in efficient DiTs. Specifically, DiffCR integrates three features: (1) A token-level routing scheme where each DiT layer includes a router that is fine-tuned jointly with model weights to predict token importance scores. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · VLSI and Analog Circuit Testing · Low-power high-performance VLSI design

MethodsInpainting