Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation

Wenchao Zhang; Jiahe Tian; Runze He; Jizhong Han; Jiao Dai; Miaomiao Feng; Wei Mi; Xiaodan Zhang

arXiv:2505.18730·cs.CV·May 27, 2025

Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation

Wenchao Zhang, Jiahe Tian, Runze He, Jizhong Han, Jiao Dai, Miaomiao Feng, Wei Mi, Xiaodan Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces ABP, a new benchmark and metric for evaluating how well text-to-image models incorporate real-world knowledge beyond prompts, revealing limitations of current models and proposing a training-free knowledge injection method to improve alignment.

Contribution

The paper presents ABP, a comprehensive benchmark and ABPScore metric for assessing real-world knowledge alignment in T2I models, and introduces ITKI, a training-free strategy to enhance this alignment.

Findings

01

State-of-the-art models struggle with simple real-world knowledge integration.

02

ABPScore correlates strongly with human judgments.

03

ITKI improves alignment by approximately 43% on challenging samples.

Abstract

Recent text-to-image (T2I) generation models have advanced significantly, enabling the creation of high-fidelity images from textual prompts. However, existing evaluation benchmarks primarily focus on the explicit alignment between generated images and prompts, neglecting the alignment with real-world knowledge beyond prompts. To address this gap, we introduce Align Beyond Prompts (ABP), a comprehensive benchmark designed to measure the alignment of generated images with real-world knowledge that extends beyond the explicit user prompts. ABP comprises over 2,000 meticulously crafted prompts, covering real-world knowledge across six distinct scenarios. We further introduce ABPScore, a metric that utilizes existing Multimodal Large Language Models (MLLMs) to assess the alignment between generated images and world knowledge beyond prompts, which demonstrates strong correlations with human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smile365317/abp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Multimodal Machine Learning Applications · Biomedical Text Mining and Ontologies

MethodsFocus · ALIGN