Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models

Tianyang Han; Junhao Su; Junjie Hu; Peizhen Yang; Hengyu Shi; Junfeng Luo; Jialin Gao

arXiv:2511.18271·cs.CV·December 12, 2025

Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models

Tianyang Han, Junhao Su, Junjie Hu, Peizhen Yang, Hengyu Shi, Junfeng Luo, Jialin Gao

PDF

Open Access

TL;DR

This paper introduces PicWorld, a comprehensive benchmark for evaluating the ability of text-to-image models to understand implicit world knowledge and physical reasoning, revealing significant limitations in current models.

Contribution

The paper presents PicWorld, a novel benchmark with a multi-agent evaluator for detailed assessment of T2I models' reasoning and knowledge grounding capabilities.

Findings

01

Current T2I models struggle with implicit world knowledge.

02

Models show limited physical causal reasoning.

03

Need for reasoning-aware architectures is highlighted.

Abstract

Text-to-image (T2I) models today are capable of producing photorealistic, instruction-following images, yet they still frequently fail on prompts that require implicit world knowledge. Existing evaluation protocols either emphasize compositional alignment or rely on single-round VQA-based scoring, leaving critical dimensions such as knowledge grounding, multi-physics interactions, and auditable evidence-substantially undertested. To address these limitations, we introduce PicWorld, the first comprehensive benchmark that assesses the grasp of implicit world knowledge and physical causal reasoning of T2I models. This benchmark consists of 1,100 prompts across three core categories. To facilitate fine-grained evaluation, we propose PW-Agent, an evidence-grounded multi-agent evaluator to hierarchically assess images on their physical realism and logical consistency by decomposing prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Ethics and Social Impacts of AI