Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
Wenchao Zhang, Jiahe Tian, Runze He, Jizhong Han, Jiao Dai, Miaomiao Feng, Wei Mi, Xiaodan Zhang

TL;DR
This paper introduces ABP, a new benchmark and metric for evaluating how well text-to-image models incorporate real-world knowledge beyond prompts, revealing limitations of current models and proposing a training-free knowledge injection method to improve alignment.
Contribution
The paper presents ABP, a comprehensive benchmark and ABPScore metric for assessing real-world knowledge alignment in T2I models, and introduces ITKI, a training-free strategy to enhance this alignment.
Findings
State-of-the-art models struggle with simple real-world knowledge integration.
ABPScore correlates strongly with human judgments.
ITKI improves alignment by approximately 43% on challenging samples.
Abstract
Recent text-to-image (T2I) generation models have advanced significantly, enabling the creation of high-fidelity images from textual prompts. However, existing evaluation benchmarks primarily focus on the explicit alignment between generated images and prompts, neglecting the alignment with real-world knowledge beyond prompts. To address this gap, we introduce Align Beyond Prompts (ABP), a comprehensive benchmark designed to measure the alignment of generated images with real-world knowledge that extends beyond the explicit user prompts. ABP comprises over 2,000 meticulously crafted prompts, covering real-world knowledge across six distinct scenarios. We further introduce ABPScore, a metric that utilizes existing Multimodal Large Language Models (MLLMs) to assess the alignment between generated images and world knowledge beyond prompts, which demonstrates strong correlations with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Multimodal Machine Learning Applications · Biomedical Text Mining and Ontologies
MethodsFocus · ALIGN
