RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment
Zutao Jiang, Guian Fang, Jianhua Han, Guansong Lu, Hang Xu, Shengcai, Liao, Xiaojun Chang, Xiaodan Liang

TL;DR
RealignDiff introduces a two-stage coarse-to-fine semantic re-alignment approach for text-to-image diffusion models, significantly enhancing the alignment between generated images and textual prompts, leading to improved visual quality and semantic accuracy.
Contribution
The paper proposes a novel two-stage re-alignment method using BLIP-2 and local dense captioning to better align images with text prompts in diffusion models.
Findings
Outperforms baseline re-alignment methods in visual quality
Achieves higher semantic similarity on MS-COCO and ViLG-300 datasets
Demonstrates effectiveness of coarse-to-fine re-alignment approach
Abstract
Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this paper, we propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff, aimed at improving the alignment between text and images in text-to-image diffusion models. In the coarse semantic re-alignment phase, a novel caption reward, leveraging the BLIP-2 model, is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. Subsequently, the fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
Methodsfail · Diffusion
