Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

Yuxuan Yao; Yuxuan Chen; Hui Li; Kaihui Cheng; Qipeng Guo; Yuwei Sun; Zilong Dong; Jingdong Wang; Siyu Zhu

arXiv:2602.06886·cs.CV·May 21, 2026

Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers

Yuxuan Yao, Yuxuan Chen, Hui Li, Kaihui Cheng, Qipeng Guo, Yuwei Sun, Zilong Dong, Jingdong Wang, Siyu Zhu

PDF

1 Models

TL;DR

This paper identifies prompt forgetting in multimodal diffusion transformers during text-to-image generation and proposes a training-free prompt reinjection method to improve instruction-following and image quality.

Contribution

It introduces prompt reinjection, a novel training-free technique to mitigate prompt forgetting in MMDiTs, enhancing their performance.

Findings

01

Prompt reinjection improves instruction-following capabilities.

02

It yields better scores on preference, aesthetics, and quality metrics.

03

The effect is verified across multiple models and benchmarks.

Abstract

Multimodal Diffusion Transformers (MMDiTs) for text-to-image generation maintain separate text and image branches, with bidirectional information flow between text tokens and visual latents throughout denoising. In this setting, we observe a prompt forgetting phenomenon: the semantics of the prompt representation in the text branch is progressively forgotten as depth increases. We further verify this effect on three representative MMDiTs--SD3, SD3.5, and FLUX.1 by probing linguistic attributes of the representations over the layers in the text branch. Motivated by these findings, we introduce a training-free approach, prompt reinjection, which reinjects prompt representations from early layers into later layers to alleviate this forgetting. Experiments on GenEval, DPG, and T2I-CompBench++ show consistent gains in instruction-following capability, along with improvements on metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
fudan-generative-ai/PromptReinjection
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Enhancement Techniques