Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
Kanghyun Baek, Jaihyun Lew, Chaehun Shin, Jungbeom Lee, Sungroh Yoon

TL;DR
This paper introduces Omission Signal Intervention (OSI), a method that amplifies omission signals in multimodal diffusion transformers to reduce concept omission in text-to-image generation.
Contribution
The paper presents a novel technique, OSI, that leverages linear probing of text embeddings to identify and amplify omission signals, improving concept inclusion.
Findings
OSI significantly reduces concept omission in experiments.
OSI improves image generation quality in extreme omission scenarios.
Linear probing reveals omission signals in text embeddings.
Abstract
Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-to-image generation, yet they frequently suffer from concept omission, where specified objects or attributes fail to emerge in the generated image. By performing linear probing on text tokens, we demonstrate that text embeddings can distinguish a characteristic `omission signal' representing the absence of target concepts. Leveraging this insight, we propose Omission Signal Intervention (OSI), which amplifies the omission signal to actively catalyze the generation of missing concepts. Comprehensive experiments on FLUX.1-Dev and SD3.5-Medium demonstrate that OSI significantly alleviates concept omission even in extreme scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
