Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng, Dequan, Wang

TL;DR
This paper identifies latent concept misalignment in text-to-image diffusion models, investigates its scope using large language models, and proposes an automated pipeline to improve semantic alignment, reducing errors and enhancing model robustness.
Contribution
The paper introduces a novel concept of Latent Concept Misalignment (LC-Mis) and develops an automated method to align latent semantics in diffusion models, improving their accuracy.
Findings
Significant reduction in LC-Mis errors
Enhanced robustness and versatility of models
Validated approach with empirical assessments
Abstract
Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies
MethodsDiffusion
