Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati; Guy Tennenholtz; ChihWei Hsu; Moonkyung Ryu; Deepak Ramachandran; Yinlam Chow; Xiang Li; Craig Boutilier

arXiv:2412.10419·cs.CV·May 29, 2025

Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

PDF

Open Access

TL;DR

This paper introduces PASTA, an RL-based system that iteratively improves text-to-image generation through user preferences, enabling adaptive, multi-turn, collaborative image creation with demonstrated human-rated improvements.

Contribution

It presents a novel RL agent for sequential T2I generation that adapts to user preferences and introduces a new dataset of human preference sequences for training and evaluation.

Findings

01

PASTA significantly outperforms baseline methods in human evaluations.

02

The dataset of sequential preferences enables better modeling of user choices.

03

Adaptive prompt expansion improves collaborative image creation.

Abstract

We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user's intent. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Data Visualization and Analytics

MethodsSparse Evolutionary Training