RoboMoRe: LLM-based Robot Co-design via Joint Optimization of Morphology and Reward

Jiawei Fang; Yuxuan Sun; Chengtian Ma; Qiuyu Lu; Lining Yao

arXiv:2506.00276·cs.RO·June 3, 2025

RoboMoRe: LLM-based Robot Co-design via Joint Optimization of Morphology and Reward

Jiawei Fang, Yuxuan Sun, Chengtian Ma, Qiuyu Lu, Lining Yao

PDF

Open Access 3 Reviews

TL;DR

RoboMoRe introduces an LLM-driven framework for joint optimization of robot morphology and reward functions, enabling more diverse and effective robot designs without task-specific templates, outperforming existing methods.

Contribution

The paper presents RoboMoRe, a novel LLM-based approach that integrates morphology and reward shaping for co-optimization in robot design, addressing limitations of fixed reward functions.

Findings

01

Outperforms human-engineered designs across eight tasks.

02

Effectively explores diverse morphology-reward pairs.

03

Does not require task-specific prompts or predefined templates.

Abstract

Robot co-design, jointly optimizing morphology and control policy, remains a longstanding challenge in the robotics community, where many promising robots have been developed. However, a key limitation lies in its tendency to converge to sub-optimal designs due to the use of fixed reward functions, which fail to explore the diverse motion modes suitable for different morphologies. Here we propose RoboMoRe, a large language model (LLM)-driven framework that integrates morphology and reward shaping for co-optimization within the robot co-design loop. RoboMoRe performs a dual-stage optimization: in the coarse optimization stage, an LLM-based diversity reflection mechanism generates both diverse and high-quality morphology-reward pairs and efficiently explores their distribution. In the fine optimization stage, top candidates are iteratively refined through alternating LLM-guided reward and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

+ This work applies emerging LLM capabilities to the classic robot co-design problem. + The paper identifies a limitation of LLM-based design -- tendency toward repetitive morphology outputs, and introduces the concept of diversity reflection to address it. + The method is evaluated on multiple tasks, though the experiments are performed in simple simulation environments.

Weaknesses

- The novelty of the work appears limited. The key contributions seem to center on prompt engineering, e.g., prompting the LLM to “reflect” on prior results to increase diversity. The reward shaping and morphology filtering mechanisms also appear straightforward (e.g., discarding repeated designs), and the coarse-to-fine pipeline resembles a standard iterative refinement process, albeit executed via an LLM. - The paper does not provide sufficient detail regarding the diversity reflection mech

Reviewer 02Rating 4Confidence 4

Strengths

The method seems novel and exposes a way to utilise LLMs for morphology design. The results seem promising as well.

Weaknesses

Some of the parts of this work need to discussed more clearly. For example, it is not clear what efficiency of a design means formally. In addition, there seems to be a lack of comparisons with existing non-LLM methods such as transform2act [1] etc., Even if this is not an equivalent method, it would still be good to include comparisons for better reference. In addition, the idea of refining the rewards is not very clear to me because it inherently changes what a "performing" agent is. [1] Yuan

Reviewer 03Rating 8Confidence 5

Strengths

- Care was taken to evaluate the use of LLMs fairly, eg by masking elements of the XML files to prevent possible training data contamination. - Using an LLM-generated reward is a relatively novel idea for co-design, albeit learned rewards have been used before in co-design (see below) and Eureka has been used in a behaviour-only RL setting. - The paper is good to follow, well written and visualisations are used nicely to support the reader's understanding.

Weaknesses

- The use of efficiency as fitness/volume is not quite clear to me. Why do you not use torque or fitness/energy instead? It seems to me that the algorithms can easily game this metric by producing as thin geom-elements as possible without increasing actual real world efficiency (asi n, energy spent per forward unit of movement). - The literature review/discussion of related works is a bit incomplete, as the idea of using learned reward functions has already been explored in previous work (see eg

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Industrial Technology and Control Systems