Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

Haruki Abe; Takayuki Osa; Yusuke Mukuta; Tatsuya Harada

arXiv:2602.18025·cs.AI·February 23, 2026

Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

Haruki Abe, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a method combining offline reinforcement learning with cross-embodiment learning to pre-train robot policies across diverse morphologies using heterogeneous datasets, improving scalability and performance.

Contribution

It presents a novel framework that unites offline RL with cross-embodiment learning, along with a grouping strategy to handle conflicting gradients from diverse robot morphologies.

Findings

01

Outperforms pure behavior cloning in pre-training tasks.

02

Conflict among different robot morphologies can hinder learning.

03

Grouping robots by morphology reduces conflicts and enhances learning performance.

Abstract

Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper fills an important gap between robot foundation models and morphology-aware generalization. Prior work (e.g., URMA) has explored transfer across morphologies but typically ignores the destructive gradient interference that arises when training across incompatible embodiments. EG’s use of FGW distance to quantify morphology-level relationships and structure training groups is an elegant and novel solution. 2. The experimental setup is extensive and thoughtfully designed. Using 16 di

Weaknesses

1. While FGW distance effectively captures morphological similarity, there is no formal argument linking FGW similarity to gradient alignment or loss landscape smoothness. Without such a bridge, the theoretical contribution remains descriptive rather than predictive. 2. Embodiment relationships evolve as policies adapt. Fixed grouping may lead to stale partitions that no longer reflect actual learning dynamics. 3. The morphological embeddings used to compute FGW distance come from URMA, which

Reviewer 02Rating 6Confidence 3

Strengths

1. The setting of cross-embodiment offline RL is novel in the community. 2. The proposed method is easy to understand and is effective in practice. 3. The comparison involve gradient projection techniques used in continual learning / multi-task literature, 3. The evaluation of different embodiment is comprehensive.

Weaknesses

1. Tasks only contain locomotion on walking. A comprehensive analysis / benchmark should involve more diverse tasks. 2. The baseline offline RL only contains IQL. More algorithms like CQL e.t.c should be evaluated. It is unclear if the claim can be extended.

Reviewer 03Rating 6Confidence 3

Strengths

1.The experimental results are strong, with experiments conducted on as many as 16 different types of robots. The proposed method shows a clear improvement over the baseline. 2.The paper employs a clear validation approach to demonstrate the impact of gradient conflicts caused by cross-embodiment data.

Weaknesses

1.The work lacks real-robot experiments. All experiments are conducted in simulated environments, with no validation on physical robots. 2.Certain acronym definitions appear after their first use in the paper; for example, “EG” appears in Table 1 before being formally introduced, which affects readability. 3.The implementation only evaluates forward and backward motions across different robots, lacking validation on a broader range of tasks.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Robot Manipulation and Learning