PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement

Tianyidan Xie; Peiyu Wang; Yuyi Qian; Yuxuan Wang; Rui Ma; Ying Tai; Song Wu; Qian Wang; Lanjun Wang; and Zili Yi

arXiv:2604.23580·cs.RO·April 28, 2026

PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement

Tianyidan Xie, Peiyu Wang, Yuyi Qian, Yuxuan Wang, Rui Ma, Ying Tai, Song Wu, Qian Wang, Lanjun Wang, and Zili Yi

PDF

TL;DR

PhysCodeBench is a new benchmark for evaluating physics-aware symbolic simulation of 3D scenes, and SMRF is a multi-agent framework that significantly improves simulation accuracy.

Contribution

Introduces PhysCodeBench, the first comprehensive benchmark for physics-aware symbolic simulation, and proposes SMRF, a multi-agent refinement framework that enhances simulation accuracy.

Findings

01

SMRF outperforms SOTA models with 67.7 points vs. 36.3 points.

02

Error correction is crucial for accurate physics simulation.

03

Specialized multi-agent approaches outperform single-agent methods.

Abstract

Physics-aware symbolic simulation of 3D scenes is critical for robotics, embodied AI, and scientific computing, requiring models to understand natural language descriptions of physical phenomena and translate them into executable simulation environments. While large language models (LLMs) excel at general code generation, they struggle with the semantic gap between physical descriptions and simulation implementation. We introduce PhysCodeBench, the first comprehensive benchmark for evaluating physics-aware symbolic simulation, comprising 700 manually-crafted diverse samples across mechanics, fluid dynamics, and soft-body physics with expert annotations. Our evaluation framework measures both code executability and physical accuracy through automated and visual assessment. Building on this, we propose a Self-Corrective Multi-Agent Refinement Framework (SMRF) with three specialized agents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.