Human Demonstrations are Generalizable Knowledge for Robots
Te Cui, Tianxing Zhou, Zicai Peng, Mengxiao Hu, Haoyang Lu, Haizhou Li, Guangyan Chen, Meiling Wang, Yufeng Yue

TL;DR
This paper introduces DigKnow, a hierarchical method that distills generalizable knowledge from human demonstration videos, enabling robots to better understand, plan, and execute diverse tasks by leveraging large language models.
Contribution
It presents a novel hierarchical knowledge distillation approach from human videos, integrating LLMs for improved robot task generalization and execution.
Findings
Enhanced success rates in diverse tasks and scenes
Effective retrieval and application of demonstration-derived knowledge
Improved robot generalization to new objects and tasks
Abstract
Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which poses obstacles to generalization to diverse tasks or object instances. In this paper, we propose a different perspective, considering human demonstration videos not as mere instructions, but as a source of knowledge for robots. Motivated by this perspective and the remarkable comprehension and generalization capabilities exhibited by large language models (LLMs), we propose DigKnow, a method that DIstills Generalizable KNOWledge with a hierarchical structure. Specifically, DigKnow begins by converting human demonstration video frames into observation knowledge. This knowledge is then subjected to analysis to extract human action knowledge and further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
