Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models

Byeonghu Na; Minsang Park; Gyuwon Sim; Donghyeok Shin; HeeSun Bae; Mina Kang; Se Jung Kwon; Wanmo Kang; Il-Chul Moon

arXiv:2510.23974·cs.LG·October 29, 2025

Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models

Byeonghu Na, Minsang Park, Gyuwon Sim, Donghyeok Shin, HeeSun Bae, Mina Kang, Se Jung Kwon, Wanmo Kang, Il-Chul Moon

PDF

1 Video

TL;DR

The paper introduces DATE, a method that dynamically updates text embeddings during diffusion sampling to enhance text-image alignment without extra training, improving generative quality and flexibility.

Contribution

We propose a novel adaptive text embedding method that updates embeddings at each diffusion step, improving alignment without additional training.

Findings

01

DATE improves text-image alignment over fixed embeddings.

02

The method enhances multi-concept generation and image editing.

03

It maintains generative capability while adapting embeddings dynamically.

Abstract

Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the generative process. We propose Diffusion Adaptive Text Embedding (DATE), which dynamically updates text embeddings at each diffusion timestep based on intermediate perturbed data. We formulate an optimization problem and derive an update rule that refines the text embeddings at each sampling step to improve alignment and preference between the mean predicted image and the text. This allows DATE to dynamically adapts the text conditions to the reverse-diffused images throughout diffusion sampling without requiring additional model training. Through theoretical analysis and empirical results, we show that DATE maintains the generative capability of the model while providing superior text-image alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models· slideslive