VersaT2I: Improving Text-to-Image Models with Versatile Reward

Jianshu Guo; Wenhao Chai; Jie Deng; Hsiang-Wei Huang; Tian Ye; Yichen; Xu; Jiawei Zhang; Jenq-Neng Hwang; and Gaoang Wang

arXiv:2403.18493·cs.CV·March 28, 2024·1 cites

VersaT2I: Improving Text-to-Image Models with Versatile Reward

Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen, Xu, Jiawei Zhang, Jenq-Neng Hwang, and Gaoang Wang

PDF

Open Access

TL;DR

VersaT2I is a versatile training framework that enhances text-to-image models by incorporating multiple quality rewards, improving aesthetics, accuracy, and fidelity without manual annotations or architecture changes.

Contribution

It introduces a multi-aspect reward-based finetuning method with a gating function, enabling improved T2I performance across various quality metrics.

Findings

01

Outperforms baseline models on multiple quality benchmarks

02

Effectively combines multiple quality aspects without conflicts

03

Does not require manual annotations or architecture modifications

Abstract

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of any T2I model. We decompose the quality of the image into several aspects such as aesthetics, text-image alignment, geometry, low-level quality, etc. Then, for every quality aspect, we select high-quality images in this aspect generated by the model as the training set to finetune the T2I model using the Low-Rank Adaptation (LoRA). Furthermore, we introduce a gating function to combine multiple quality aspects, which can avoid conflicts between different quality aspects. Our method is easy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training