Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems
YuChe Hsu, AnJui Wang, TsaiChing Ni, YuanFu Yang

TL;DR
This paper introduces a Vision-Language Simulation Model (VLSM) for industrial systems that synthesizes executable code from sketches and prompts, supported by a large dataset and new evaluation metrics, advancing multimodal industrial simulation.
Contribution
It presents the first large-scale dataset and novel evaluation metrics for generative digital twins, integrating visual and textual reasoning for executable industrial simulations.
Findings
Models achieve near-perfect structural accuracy.
High robustness in simulation execution.
Established a foundational framework for multimodal industrial digital twins.
Abstract
We propose a Vision-Language Simulation Model (VLSM) that unifies visual and textual understanding to synthesize executable FlexScript from layout sketches and natural-language prompts, enabling cross-modal reasoning for industrial simulation systems. To support this new paradigm, the study constructs the first large-scale dataset for generative digital twins, comprising over 120,000 prompt-sketch-code triplets that enable multimodal learning between textual descriptions, spatial structures, and simulation logic. In parallel, three novel evaluation metrics, Structural Validity Rate (SVR), Parameter Match Rate (PMR), and Execution Success Rate (ESR), are proposed specifically for this task to comprehensively evaluate structural integrity, parameter fidelity, and simulator executability. Through systematic ablation across vision encoders, connectors, and code-pretrained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · BIM and Construction Integration
