Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code

Zhenghan Song; Yulong Liu; Cheng Wan; Chenjun Li; Lingfu Liu; Yunyi Li; Congcong Yuan

arXiv:2605.09360·cs.LG·May 12, 2026

Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code

Zhenghan Song, Yulong Liu, Cheng Wan, Chenjun Li, Lingfu Liu, Yunyi Li, Congcong Yuan

PDF

TL;DR

This paper introduces a PDE-grounded verification method for LLM-generated scientific simulation code, ensuring the code accurately encodes intended physics rather than just executing successfully.

Contribution

It formalizes the Intent Fidelity Score (IFS) to measure physics correctness and develops a PDE-grounded refinement loop to improve generated code accuracy.

Findings

01

Mean IFS improves over direct generation on MooseBench

02

Refinement significantly boosts IFS on hard cases

03

Execution success does not guarantee correct physics encoding

Abstract

Execution-based evaluation of LLM-generated code implicitly treats successful execution as a proxy for correctness. In scientific simulation, this proxy is insufficient: a generated input file can run, mesh, and converge while encoding governing equations that differ from the user's intent. We call this mismatch between intended physics and generated code the comprehension-generation gap. We instantiate this in MOOSE, where Kernel and BC objects map compositionally to weak-form residual terms, enabling deterministic reconstruction of the encoded PDE and comparison against an intended contract. We formalize this comparison as the Intent Fidelity Score (IFS), a structural metric covering governing terms, BCs, ICs, coefficients, and time scheme. Building on IFS, we develop a PDE-grounded refinement loop that uses deterministic violation reports to correct generated code iteratively. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.