Is the Elephant Flying? Resolving Ambiguities in Text-to-Image   Generative Models

Ninareh Mehrabi; Palash Goyal; Apurv Verma; Jwala Dhamala; Varun; Kumar; Qian Hu; Kai-Wei Chang; Richard Zemel; Aram Galstyan; Rahul Gupta

arXiv:2211.12503·cs.CL·November 24, 2022·5 cites

Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun, Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta

PDF

Open Access

TL;DR

This paper addresses ambiguities in text-to-image generative models by creating a benchmark dataset and proposing a clarification framework, significantly improving the faithfulness of generated images to user intent.

Contribution

The work introduces a new benchmark dataset for ambiguities in text-to-image models and a framework that solicits user clarifications to resolve these ambiguities.

Findings

01

The framework improves image faithfulness in ambiguous prompts.

02

Automatic and human evaluations confirm the effectiveness of the approach.

03

The dataset covers diverse ambiguity types in text-to-image generation.

Abstract

Natural language often contains ambiguities that can lead to misinterpretation and miscommunication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. We then propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user. Through automatic and human evaluations, we show the effectiveness of our framework in generating more faithful images aligned with human intention in the presence of ambiguities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization