OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer
Kuanning Wang, Ke Fan, Yuqian Fu, Siyu Lin, Hu Luo, Daniel Seita, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue

TL;DR
OCRA introduces an object-centric, multimodal framework combining 3D vision and tactile data to improve human-to-robot action transfer, enabling robots to learn manipulation tasks more robustly from videos.
Contribution
The paper presents a novel object-centric learning framework that integrates 3D visual and tactile priors for improved robot manipulation from human demonstrations.
Findings
OCRA outperforms existing baselines in vision-only tasks.
OCRA effectively fuses 3D and tactile data for robust manipulation.
The approach demonstrates significant improvements in real-world experiments.
Abstract
We present OCRA, an Object-Centric framework for video-based human-to-Robot Action transfer that learns directly from human demonstration videos to enable robust manipulation. Object-centric learning emphasizes task-relevant objects and their interactions while filtering out irrelevant background, providing a natural and scalable way to teach robots. OCRA leverages multi-view RGB videos, the state-of-the-art 3D foundation model VGGT, and advanced detection and segmentation models to reconstruct object-centric 3D point clouds, capturing rich interactions between objects. To handle properties not easily perceived by vision alone, we incorporate tactile priors via a large-scale dataset of over one million tactile images. These 3D and tactile priors are fused through a multimodal module (ResFiLM) and fed into a Diffusion Policy to generate robust manipulation actions. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Sensor and Energy Harvesting Materials · Social Robot Interaction and HRI
