PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
Tianyidan Xie, Peiyu Wang, Yuyi Qian, Yuxuan Wang, Rui Ma, Ying Tai, Song Wu, Qian Wang, Lanjun Wang, and Zili Yi

TL;DR
PhysCodeBench is a new benchmark for evaluating physics-aware symbolic simulation of 3D scenes, and SMRF is a multi-agent framework that significantly improves simulation accuracy.
Contribution
Introduces PhysCodeBench, the first comprehensive benchmark for physics-aware symbolic simulation, and proposes SMRF, a multi-agent refinement framework that enhances simulation accuracy.
Findings
SMRF outperforms SOTA models with 67.7 points vs. 36.3 points.
Error correction is crucial for accurate physics simulation.
Specialized multi-agent approaches outperform single-agent methods.
Abstract
Physics-aware symbolic simulation of 3D scenes is critical for robotics, embodied AI, and scientific computing, requiring models to understand natural language descriptions of physical phenomena and translate them into executable simulation environments. While large language models (LLMs) excel at general code generation, they struggle with the semantic gap between physical descriptions and simulation implementation. We introduce PhysCodeBench, the first comprehensive benchmark for evaluating physics-aware symbolic simulation, comprising 700 manually-crafted diverse samples across mechanics, fluid dynamics, and soft-body physics with expert annotations. Our evaluation framework measures both code executability and physical accuracy through automated and visual assessment. Building on this, we propose a Self-Corrective Multi-Agent Refinement Framework (SMRF) with three specialized agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
