RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot
Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang,, Junbo Wang, Haoyi Zhu, Cewu Lu

TL;DR
This paper introduces RH20T, a large-scale, multi-modal robotic dataset with over 110,000 real-world manipulation sequences, designed to facilitate learning diverse, complex skills through one-shot imitation learning.
Contribution
The paper presents a comprehensive, high-quality dataset with multi-modal sensory data, human demonstrations, and language descriptions to advance generalizable robotic skill acquisition.
Findings
Dataset includes over 110,000 sequences across diverse skills and contexts.
Sequences contain visual, force, audio, and action data for rich perception.
High-quality calibration and real-world collection enhance dataset reliability.
Abstract
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improving task and motion planning. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 contact-rich robot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsFocus
