SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
Yanchang Liang, Xiaowei Zhao

TL;DR
SimuAgent is an LLM-powered tool that improves Simulink modeling by using concise representations, a specialized training architecture, and a novel reinforcement learning method, leading to faster, more accurate, and privacy-preserving engineering workflows.
Contribution
The paper introduces SimuAgent, a novel LLM-based Simulink modeling assistant with a two-stage training architecture and Reflection-GRPO, enhancing modeling accuracy and convergence speed in industrial engineering tasks.
Findings
Outperforms standard RL baselines in SimuBench
Surpasses GPT-4o with few-shot prompting on the same benchmark
Ablation studies confirm benefits of curriculum and data augmentation
Abstract
Large language models (LLMs) have revolutionized text-based code automation, but their potential in graph-oriented engineering workflows remains under-explored. We introduce SimuAgent, an LLM-powered modeling and simulation agent tailored for Simulink. SimuAgent replaces verbose XML with a concise, dictionary-style Python representation, dramatically cutting token counts, improving interpretability, and enabling fast, in-process simulation. A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning. To tackle sparse rewards in long-horizon tasks, we propose Reflection-GRPO (ReGRPO), which augments Group Relative Policy Optimization (GRPO) with self-reflection traces that supply rich intermediate feedback, accelerating convergence and boosting robustness. Experiments on SimuBench, our newly released…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper effectively frames the problem, shows how previous methods (XML) lead to a large number of tokens, and showcases Python-dictionary representation as a suitable choice - Reflection and retry is a simple mechanism to tackle the sparse reward issue of just having the output of 0/1 at the end of the episode. - The SimuBench dataset provides examples over various system-design domains. - The paper is well written, has done extensive experiments, with multiple ablations and transfer to ot
- The algorithm is only compared with GRPO. How does the method compare to other baselines for LLM tool-use and RL? - Improvements on generic NLP benchmarks are small, code-based tasks show more gain, but SimuBench is the setting where reflection is most helpful. - More methodological clarifications on reward structure, prompt differences for image-based inputs are needed.
- The integration of Reflection-GRPO with Simulink tool feedback is a notable contribution. The agent leverages intermediate reflection traces and programmatic validation signals (e.g., structural checks, execution feedback, block-level errors) to guide long-horizon updates. This mechanism improves sample efficiency, stabilizes training under sparse rewards, and provides a general recipe for scaling RLHF-style methods to complex tool-using domains beyond text-only reasoning. - The Python-dictio
- The Introduction section is very well-written and effectively motivates the need for an automation agent for Simulink. However, the proposed method and experimental sections lack critical implementation details and could be substantially improved through better organization. For instance, in the architecture description (Section 3), it would be far more informative if the pipeline stages were presented sequentially, explaining the order of operations and data flow, rather than only listing the
- Interesting problem, certainly high industry impact
- Basically zero scientific novelty. This is an engineering project without many generalizable takeaways. - Presentation is inconsistent and unclear what the actual contribution is: toolbox, method, architecture, benchmark... All of these are claimed in the paper, but unclear which one is it. For some reason, it is claimed that a "Python-based model representation," which is a dictionary, is a contribution. Certainly not for a top conference. It supposedly improves interpretability. This obvious
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Model-Driven Software Engineering Techniques
