PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng, Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo

TL;DR
This paper introduces PhyBench, a new benchmark dataset for evaluating text-to-image models' understanding of physical commonsense across various scenarios, revealing current models' limitations and proposing evaluation improvements.
Contribution
The paper presents PhyBench, a comprehensive dataset for assessing physical commonsense in T2I models, and demonstrates the need for models to incorporate physical reasoning.
Findings
Advanced models often fail in physical scenarios except optics.
GPT-4o effectively evaluates physical understanding, aligning with human judgment.
Current T2I models lack deep physical reasoning capabilities.
Abstract
Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge, particularly physical commonsense. To address this issue, we introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties, encompassing 31 distinct physical scenarios. We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsSoftmax · Attention Is All You Need · Focus
