Prompt Recovery for Image Generation Models: A Comparative Study of   Discrete Optimizers

Joshua Nathaniel Williams; Avi Schwarzschild; Yutong He; J. Zico; Kolter

arXiv:2408.06502·cs.CV·May 1, 2025

Prompt Recovery for Image Generation Models: A Comparative Study of Discrete Optimizers

Joshua Nathaniel Williams, Avi Schwarzschild, Yutong He, J. Zico, Kolter

PDF

Open Access

TL;DR

This paper compares various discrete optimization methods for recovering natural language prompts from generated images, revealing that captioner responses often outperform direct optimization in producing similar images.

Contribution

First comprehensive comparison of discrete optimizers for prompt inversion in image generation, highlighting limitations of CLIP-based metrics.

Findings

01

Discrete optimizers effectively minimize their objectives.

02

Captioner responses often yield more accurate prompt recovery.

03

CLIP similarity is a poor proxy for image prompt similarity.

Abstract

Recovering natural language prompts for image generation models, solely based on the generated images is a difficult discrete optimization problem. In this work, we present the first head-to-head comparison of recent discrete optimization techniques for the problem of prompt inversion. We evaluate Greedy Coordinate Gradients (GCG), PEZ , Random Search, AutoDAN and BLIP2's image captioner across various evaluation metrics related to the quality of inverted prompts and the quality of the images generated by the inverted prompts. We find that focusing on the CLIP similarity between the inverted prompts and the ground truth image acts as a poor proxy for the similarity between ground truth image and the image generated by the inverted prompts. While the discrete optimizers effectively minimize their objectives, simply using responses from a well-trained captioner often leads to generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Computer Graphics and Visualization Techniques · Advanced Vision and Imaging

MethodsRandom Search · Contrastive Language-Image Pre-training