PAI-Bench: A Comprehensive Benchmark For Physical AI
Fengzhe Zhou, Jiannan Huang, Jialuo Li, Deva Ramanan, Humphrey Shi

TL;DR
PAI-Bench is a comprehensive benchmark designed to evaluate the perception and prediction capabilities of models in Physical AI, revealing current limitations in physical coherence and causal reasoning across models.
Contribution
Introduces PAI-Bench, a unified benchmark for assessing physical perception and prediction in models, covering video generation and understanding with real-world cases and specialized metrics.
Findings
Video generative models lack physical coherence despite visual fidelity.
Multi-modal large language models show limited forecasting and causal reasoning.
Current models are in early stages of handling Physical AI perceptual and predictive tasks.
Abstract
Physical AI aims to develop models that can perceive and predict real-world dynamics; yet, the extent to which current multi-modal large language models and video generative models support these abilities is insufficiently understood. We introduce Physical AI Bench (PAI-Bench), a unified and comprehensive benchmark that evaluates perception and prediction capabilities across video generation, conditional video generation, and video understanding, comprising 2,808 real-world cases with task-aligned metrics designed to capture physical plausibility and domain-specific reasoning. Our study provides a systematic assessment of recent models and shows that video generative models, despite strong visual fidelity, often struggle to maintain physically coherent dynamics, while multi-modal large language models exhibit limited performance in forecasting and causal interpretation. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
