SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Yanchang Liang; Xiaowei Zhao

arXiv:2601.05187·cs.AI·January 9, 2026

SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Yanchang Liang, Xiaowei Zhao

PDF

Open Access 3 Reviews

TL;DR

SimuAgent is an LLM-powered tool that improves Simulink modeling by using concise representations, a specialized training architecture, and a novel reinforcement learning method, leading to faster, more accurate, and privacy-preserving engineering workflows.

Contribution

The paper introduces SimuAgent, a novel LLM-based Simulink modeling assistant with a two-stage training architecture and Reflection-GRPO, enhancing modeling accuracy and convergence speed in industrial engineering tasks.

Findings

01

Outperforms standard RL baselines in SimuBench

02

Surpasses GPT-4o with few-shot prompting on the same benchmark

03

Ablation studies confirm benefits of curriculum and data augmentation

Abstract

Large language models (LLMs) have revolutionized text-based code automation, but their potential in graph-oriented engineering workflows remains under-explored. We introduce SimuAgent, an LLM-powered modeling and simulation agent tailored for Simulink. SimuAgent replaces verbose XML with a concise, dictionary-style Python representation, dramatically cutting token counts, improving interpretability, and enabling fast, in-process simulation. A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning. To tackle sparse rewards in long-horizon tasks, we propose Reflection-GRPO (ReGRPO), which augments Group Relative Policy Optimization (GRPO) with self-reflection traces that supply rich intermediate feedback, accelerating convergence and boosting robustness. Experiments on SimuBench, our newly released…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 5

Strengths

- The paper effectively frames the problem, shows how previous methods (XML) lead to a large number of tokens, and showcases Python-dictionary representation as a suitable choice - Reflection and retry is a simple mechanism to tackle the sparse reward issue of just having the output of 0/1 at the end of the episode. - The SimuBench dataset provides examples over various system-design domains. - The paper is well written, has done extensive experiments, with multiple ablations and transfer to ot

Weaknesses

- The algorithm is only compared with GRPO. How does the method compare to other baselines for LLM tool-use and RL? - Improvements on generic NLP benchmarks are small, code-based tasks show more gain, but SimuBench is the setting where reflection is most helpful. - More methodological clarifications on reward structure, prompt differences for image-based inputs are needed.

Reviewer 02Rating 6Confidence 4

Strengths

- The integration of Reflection-GRPO with Simulink tool feedback is a notable contribution. The agent leverages intermediate reflection traces and programmatic validation signals (e.g., structural checks, execution feedback, block-level errors) to guide long-horizon updates. This mechanism improves sample efficiency, stabilizes training under sparse rewards, and provides a general recipe for scaling RLHF-style methods to complex tool-using domains beyond text-only reasoning. - The Python-dictio

Weaknesses

- The Introduction section is very well-written and effectively motivates the need for an automation agent for Simulink. However, the proposed method and experimental sections lack critical implementation details and could be substantially improved through better organization. For instance, in the architecture description (Section 3), it would be far more informative if the pipeline stages were presented sequentially, explaining the order of operations and data flow, rather than only listing the

Reviewer 03Rating 0Confidence 4

Strengths

- Interesting problem, certainly high industry impact

Weaknesses

- Basically zero scientific novelty. This is an engineering project without many generalizable takeaways. - Presentation is inconsistent and unclear what the actual contribution is: toolbox, method, architecture, benchmark... All of these are claimed in the paper, but unclear which one is it. For some reason, it is claimed that a "Python-based model representation," which is a dictionary, is a contribution. Certainly not for a top conference. It supposedly improves interpretability. This obvious

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Model-Driven Software Engineering Techniques