Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific   Feedback

Xuexiang Niu; Jinping Tang; Lei Wang; Ge Zhu

arXiv:2412.00122·cs.CV·December 3, 2024

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback

Xuexiang Niu, Jinping Tang, Lei Wang, Ge Zhu

PDF

Open Access

TL;DR

This paper introduces a fine-tuning method for text-to-image diffusion models that improves alignment accuracy by focusing on object categories and quantities, using a novel reward-based feedback mechanism.

Contribution

The authors propose a new feedback learning approach emphasizing entity accuracy, along with a dataset for compositional generation and a metric for alignment evaluation.

Findings

01

Outperforms state-of-the-art methods in alignment and fidelity

02

Effective in improving object category and quantity accuracy

03

Provides a new benchmark dataset for compositional generation

Abstract

Learning from feedback has been shown to enhance the alignment between text prompts and images in text-to-image diffusion models. However, due to the lack of focus in feedback content, especially regarding the object type and quantity, these techniques struggle to accurately match text and images when faced with specified prompts. To address this issue, we propose an efficient fine-turning method with specific reward objectives, including three stages. First, generated images from diffusion model are detected to obtain the object categories and quantities. Meanwhile, the confidence of category and quantity can be derived from the detection results and given prompts. Next, we define a novel matching score, based on above confidence, to measure text-image alignment. It can guide the model for feedback learning in the form of a reward function. Finally, we fine-tune the diffusion model by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods