Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho, Abhay Zala, Mohit Bansal

TL;DR
This paper introduces two novel visual programming frameworks, VPGen for controllable text-to-image generation and VPEval for explainable evaluation, enhancing interpretability, spatial control, and human-aligned assessment of T2I models.
Contribution
The paper presents VPGen and VPEval, pioneering interpretable frameworks for T2I generation and evaluation, improving control and explainability over existing methods.
Findings
VPGen offers improved spatial control over state-of-the-art models.
VPEval provides more human-correlated evaluation results.
Both frameworks enhance interpretability and explainability in T2I tasks.
Abstract
As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs with visual understanding, we propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation. First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation. We employ an LM to handle the first two steps (object/count generation and layout generation), by finetuning it on text-layout pairs. Our step-by-step T2I generation framework provides stronger spatial control than end-to-end models, the dominant approach for this task. Furthermore, we leverage the world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
