Towards Understanding and Quantifying Uncertainty for Text-to-Image   Generation

Gianni Franchi; Dat Nguyen Trong; Nacim Belkhir; Guoxuan Xia; Andrea; Pilzer

arXiv:2412.03178·cs.AI·December 5, 2024

Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation

Gianni Franchi, Dat Nguyen Trong, Nacim Belkhir, Guoxuan Xia, Andrea, Pilzer

PDF

Open Access

TL;DR

This paper introduces PUNC, a novel method leveraging large vision-language models to quantify and evaluate uncertainty in text-to-image generation, improving understanding of model behavior and enabling applications like bias detection and OOD detection.

Contribution

The paper presents PUNC, the first prompt-based uncertainty estimation method for T2I models that disentangles aleatoric and epistemic uncertainties using caption comparison.

Findings

01

PUNC outperforms existing uncertainty estimation techniques.

02

PUNC enables semantic comparison between prompts and generated images.

03

The dataset facilitates further research in T2I uncertainty quantification.

Abstract

Uncertainty quantification in text-to-image (T2I) generative models is crucial for understanding model behavior and improving output reliability. In this paper, we are the first to quantify and evaluate the uncertainty of T2I models with respect to the prompt. Alongside adapting existing approaches designed to measure uncertainty in the image space, we also introduce Prompt-based UNCertainty Estimation for T2I models (PUNC), a novel method leveraging Large Vision-Language Models (LVLMs) to better address uncertainties arising from the semantics of the prompt and generated images. PUNC utilizes a LVLM to caption a generated image, and then compares the caption with the original prompt in the more semantically meaningful text space. PUNC also enables the disentanglement of both aleatoric and epistemic uncertainties via precision and recall, which image-space approaches are unable to do.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics