TL;DR
This paper introduces Adaptive Prompt Elicitation (APE), an interactive method that refines user prompts for text-to-image models by adaptively generating visual queries, leading to better alignment and user experience.
Contribution
It presents a novel information-theoretic framework for interactive intent inference and visual query generation to improve prompt quality in text-to-image generation.
Findings
APE outperforms baseline methods on IDEA-Bench and DesignBench.
User study shows 19.8% higher perceived alignment with APE.
APE improves efficiency without increasing user workload.
Abstract
Aligning text-to-image generation with user intent remains challenging, as users frequently provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively poses visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent user intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with 128 participants on user-defined tasks demonstrates 19.8% higher perceived alignment without increased workload. Our work contributes a principled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
