PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image   Models

Fanqing Meng; Wenqi Shao; Lixin Luo; Yahong Wang; Yiran Chen; Quanfeng; Lu; Yue Yang; Tianshuo Yang; Kaipeng Zhang; Yu Qiao; Ping Luo

arXiv:2406.11802·cs.CV·September 24, 2024

PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng, Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo

PDF

Open Access

TL;DR

This paper introduces PhyBench, a new benchmark dataset for evaluating text-to-image models' understanding of physical commonsense across various scenarios, revealing current models' limitations and proposing evaluation improvements.

Contribution

The paper presents PhyBench, a comprehensive dataset for assessing physical commonsense in T2I models, and demonstrates the need for models to incorporate physical reasoning.

Findings

01

Advanced models often fail in physical scenarios except optics.

02

GPT-4o effectively evaluates physical understanding, aligning with human judgment.

03

Current T2I models lack deep physical reasoning capabilities.

Abstract

Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge, particularly physical commonsense. To address this issue, we introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties, encompassing 31 distinct physical scenarios. We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsSoftmax · Attention Is All You Need · Focus