Teaching robots to imitate a human with no on-teacher sensors. What are the key challenges?
Radoslav Skoviera, Karla Stepanova, Michael Tesar, Gabriela Sejnova,, Jiri Sedlar, Michal Vavrecka, Robert Babuska, and Josef Sivic

TL;DR
This paper explores learning object manipulation tasks from human demonstrations using RGB or RGB-D cameras without on-teacher sensors, addressing challenges in data capture, pose estimation, and natural language processing.
Contribution
It identifies key challenges in sensor selection, pose estimation, and language understanding, and presents an architecture for transferring learned tasks to robots without on-teacher sensors.
Findings
Successful demonstration of gluing and block-stacking tasks
Discussion on how linguistic descriptions can improve task accuracy
Framework for transferring tasks from demonstration to robot environments
Abstract
In this paper, we consider the problem of learning object manipulation tasks from human demonstration using RGB or RGB-D cameras. We highlight the key challenges in capturing sufficiently good data with no tracking devices - starting from sensor selection and accurate 6DoF pose estimation to natural language processing. In particular, we focus on two showcases: gluing task with a glue gun and simple block-stacking with variable blocks. Furthermore, we discuss how a linguistic description of the task could help to improve the accuracy of task description. We also present the whole architecture of our transfer of the imitated task to the simulated and real robot environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
