Token Pruning for In-Context Generation in Diffusion Transformers

Junqing Lin; Xingyu Zheng; Pei Cheng; Bin Fu; Jingwei Sun; Guangzhong Sun

arXiv:2602.01609·cs.CV·February 3, 2026

Token Pruning for In-Context Generation in Diffusion Transformers

Junqing Lin, Xingyu Zheng, Pei Cheng, Bin Fu, Jingwei Sun, Guangzhong Sun

PDF

Open Access

TL;DR

This paper introduces ToPi, a token pruning framework for diffusion transformers that reduces computational load during in-context image generation by selectively removing less important tokens without sacrificing quality.

Contribution

ToPi is a novel, training-free token pruning method specifically designed for diffusion transformers, utilizing sensitivity analysis and influence metrics for effective token reduction.

Findings

01

Achieves over 30% inference speedup

02

Maintains structural fidelity and visual consistency

03

Effective across complex image generation tasks

Abstract

In-context generation significantly enhances Diffusion Transformers (DiTs) by enabling controllable image-to-image generation through reference examples. However, the resulting input concatenation drastically increases sequence length, creating a substantial computational bottleneck. Existing token reduction techniques, primarily tailored for text-to-image synthesis, fall short in this paradigm as they apply uniform reduction strategies, overlooking the inherent role asymmetry between reference contexts and target latents across spatial, temporal, and functional dimensions. To bridge this gap, we introduce ToPi, a training-free token pruning framework tailored for in-context generation in DiTs. Specifically, ToPi utilizes offline calibration-driven sensitivity analysis to identify pivotal attention layers, serving as a robust proxy for redundancy estimation. Leveraging these layers, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Multimodal Machine Learning Applications