Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for   Robotic Manipulation

Jiaming Zhou; Teli Ma; Kun-Yu Lin; Zifan Wang; Ronghe Qiu; Junwei; Liang

arXiv:2406.14235·cs.CV·April 8, 2025

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

Jiaming Zhou, Teli Ma, Kun-Yu Lin, Zifan Wang, Ronghe Qiu, Junwei, Liang

PDF

Open Access

TL;DR

This paper introduces a novel adaptation method that uses paired human-robot videos and contrastive alignment to improve visual representations for robotic manipulation, significantly enhancing performance across diverse tasks.

Contribution

The paper presents a new adaptation paradigm leveraging paired human-robot videos and a contrastive loss to bridge the human-robot domain gap in visual pre-training for manipulation.

Findings

01

Over 7% improvement in success rate across multiple tasks

02

Significant gains on both simulated and real-world benchmarks

03

Effective in single-task and multi-task settings

Abstract

Learning generalizable visual representations across different embodied environments is essential for effective robotic manipulation in real-world scenarios. However, the limited scale and diversity of robot demonstration data pose a significant challenge. Recent research has explored leveraging large-scale human activity data for pre-training, but the substantial morphological differences between humans and robots introduce a significant human-robot domain discrepancy, hindering the generalization of these models to downstream manipulation tasks. To overcome this, we propose a novel adaptation paradigm that leverages readily available paired human-robot video data to bridge the domain gap. Our method employs a human-robot contrastive alignment loss to align the semantics of human and robot videos, adapting pre-trained models to the robot domain in a parameter-efficient manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning

MethodsALIGN