Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1

Junsung Park; Hogun Kee; and Songhwai Oh

arXiv:2512.01358·cs.RO·December 2, 2025

Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1

Junsung Park, Hogun Kee, and Songhwai Oh

PDF

Open Access

TL;DR

This paper introduces a modality-augmented fine-tuning framework that enhances foundation robot policies for cross-embodiment manipulation, demonstrating significant performance improvements across different robot platforms and modalities.

Contribution

It presents a novel multi-modal fine-tuning approach and new datasets for adapting policies to different robot embodiments, improving success rates substantially.

Findings

01

Contact-state cues and RGB-D fusion improve success rates on GR1.

02

Contact-augmented models achieve 94% success in G1 task.

03

Lightweight post-processing and high-quality multi-modal data are key for transfer.

Abstract

This paper presents a modality-augmented fine-tuning framework designed to adapt foundation robot policies to diverse humanoid embodiments. We validate our approach across two distinct settings: (i) the GR1 embodiment, utilizing public datasets where we introduce post-processed modalities, including binary contact signals and ZoeDepth-generated metric depth; and (ii) the Unitree G1 embodiment, for which we contribute a novel multi-modal dataset incorporating cuRobo motion planning, inverse kinematics, and ground-truth contact-force measurements. Our experiments demonstrate that modality augmentation consistently enhances policy performance across different embodiments. Specifically, for the GR1, integrating contact-state cues and RGB-D fusion improves online success rates from 51% to 63%. Furthermore, in the G1 "Pick Apple to Bowl" task, our contact-augmented model achieves a success…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Robotic Locomotion and Control