Token Pruning for In-Context Generation in Diffusion Transformers
Junqing Lin, Xingyu Zheng, Pei Cheng, Bin Fu, Jingwei Sun, Guangzhong Sun

TL;DR
This paper introduces ToPi, a token pruning framework for diffusion transformers that reduces computational load during in-context image generation by selectively removing less important tokens without sacrificing quality.
Contribution
ToPi is a novel, training-free token pruning method specifically designed for diffusion transformers, utilizing sensitivity analysis and influence metrics for effective token reduction.
Findings
Achieves over 30% inference speedup
Maintains structural fidelity and visual consistency
Effective across complex image generation tasks
Abstract
In-context generation significantly enhances Diffusion Transformers (DiTs) by enabling controllable image-to-image generation through reference examples. However, the resulting input concatenation drastically increases sequence length, creating a substantial computational bottleneck. Existing token reduction techniques, primarily tailored for text-to-image synthesis, fall short in this paradigm as they apply uniform reduction strategies, overlooking the inherent role asymmetry between reference contexts and target latents across spatial, temporal, and functional dimensions. To bridge this gap, we introduce ToPi, a training-free token pruning framework tailored for in-context generation in DiTs. Specifically, ToPi utilizes offline calibration-driven sensitivity analysis to identify pivotal attention layers, serving as a robust proxy for redundancy estimation. Leveraging these layers, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Multimodal Machine Learning Applications
