OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer

Kuanning Wang; Ke Fan; Yuqian Fu; Siyu Lin; Hu Luo; Daniel Seita; Yanwei Fu; Yu-Gang Jiang; Xiangyang Xue

arXiv:2603.14401·cs.RO·March 17, 2026

OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer

Kuanning Wang, Ke Fan, Yuqian Fu, Siyu Lin, Hu Luo, Daniel Seita, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue

PDF

Open Access

TL;DR

OCRA introduces an object-centric, multimodal framework combining 3D vision and tactile data to improve human-to-robot action transfer, enabling robots to learn manipulation tasks more robustly from videos.

Contribution

The paper presents a novel object-centric learning framework that integrates 3D visual and tactile priors for improved robot manipulation from human demonstrations.

Findings

01

OCRA outperforms existing baselines in vision-only tasks.

02

OCRA effectively fuses 3D and tactile data for robust manipulation.

03

The approach demonstrates significant improvements in real-world experiments.

Abstract

We present OCRA, an Object-Centric framework for video-based human-to-Robot Action transfer that learns directly from human demonstration videos to enable robust manipulation. Object-centric learning emphasizes task-relevant objects and their interactions while filtering out irrelevant background, providing a natural and scalable way to teach robots. OCRA leverages multi-view RGB videos, the state-of-the-art 3D foundation model VGGT, and advanced detection and segmentation models to reconstruct object-centric 3D point clouds, capturing rich interactions between objects. To handle properties not easily perceived by vision alone, we incorporate tactile priors via a large-scale dataset of over one million tactile images. These 3D and tactile priors are fused through a multimodal module (ResFiLM) and fed into a Diffusion Policy to generate robust manipulation actions. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Sensor and Energy Harvesting Materials · Social Robot Interaction and HRI