SimBench: A Framework for Evaluating and Diagnosing LLM-Based Digital-Twin Generation for Multi-Physics Simulation
Jingquan Wang, Andrew Negrut, Hongyu Wang, Harry Zhang, Dan Negrut

TL;DR
SimBench is a comprehensive benchmark framework that evaluates the ability of simulator-oriented large language models to generate high-quality digital twins for multi-physics simulation, using an LLM-based judging system and multi-physics testing with Chrono.
Contribution
This paper introduces SimBench, a novel benchmarking approach for assessing S-LLMs' digital twin generation capabilities across multiple physics simulators, with an innovative LLM-based evaluation method.
Findings
Over 33 S-LLMs evaluated using SimBench.
J-LLM provides consistent scoring for digital twins.
Framework applicable to various simulation platforms.
Abstract
We introduce SimBench, a benchmark designed to evaluate the proficiency of simulator-oriented LLMs (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark ranks them according to their ability to produce high-quality DTs. We demonstrate this by comparing over 33 open- and closed-source S-LLMs. Using multi-turn interactions, SimBench employs an LLM-as-a-judge (J-LLM) that leverages both predefined rules and human-in-the-loop guidance to assign scores for the DTs generated by the S-LLM, thus providing a consistent and expert-inspired evaluation protocol. The J-LLM is specific to a simulator, and herein the proposed benchmarking approach is demonstrated in conjunction with the open-sourceChrono multi-physics simulator. Chrono provided the backdrop used to assess an S-LLM in relation to the latter's ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Transformation in Industry · Data Quality and Management · Big Data and Business Intelligence
