Aligning Text-to-Image Models using Human Feedback

Kimin Lee; Hao Liu; Moonkyung Ryu; Olivia Watkins; Yuqing Du; Craig; Boutilier; Pieter Abbeel; Mohammad Ghavamzadeh; Shixiang Shane Gu

arXiv:2302.12192·cs.LG·February 24, 2023·35 cites

Aligning Text-to-Image Models using Human Feedback

Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig, Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

PDF

Open Access

TL;DR

This paper introduces a human feedback-based fine-tuning approach for text-to-image models, significantly enhancing their alignment with textual prompts by leveraging human evaluations and reward modeling.

Contribution

It presents a novel three-stage method for aligning text-to-image models using human feedback, including feedback collection, reward function training, and model fine-tuning.

Findings

01

Improved accuracy in generating specified object attributes

02

Effective use of human feedback for model alignment

03

Analysis of design choices affecting alignment-fidelity tradeoffs

Abstract

Deep generative models have shown impressive results in text-to-image synthesis. However, current text-to-image models often generate images that are inadequately aligned with text prompts. We propose a fine-tuning method for aligning such models using human feedback, comprising three stages. First, we collect human feedback assessing model output alignment from a set of diverse text prompts. We then use the human-labeled image-text dataset to train a reward function that predicts human feedback. Lastly, the text-to-image model is fine-tuned by maximizing reward-weighted likelihood to improve image-text alignment. Our method generates objects with specified colors, counts and backgrounds more accurately than the pre-trained model. We also analyze several design choices and find that careful investigations on such design choices are important in balancing the alignment-fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Augmented Reality Applications · Handwritten Text Recognition Techniques