Human Demonstrations are Generalizable Knowledge for Robots

Te Cui; Tianxing Zhou; Zicai Peng; Mengxiao Hu; Haoyang Lu; Haizhou Li; Guangyan Chen; Meiling Wang; Yufeng Yue

arXiv:2312.02419·cs.RO·July 18, 2025·1 cites

Human Demonstrations are Generalizable Knowledge for Robots

Te Cui, Tianxing Zhou, Zicai Peng, Mengxiao Hu, Haoyang Lu, Haizhou Li, Guangyan Chen, Meiling Wang, Yufeng Yue

PDF

Open Access

TL;DR

This paper introduces DigKnow, a hierarchical method that distills generalizable knowledge from human demonstration videos, enabling robots to better understand, plan, and execute diverse tasks by leveraging large language models.

Contribution

It presents a novel hierarchical knowledge distillation approach from human videos, integrating LLMs for improved robot task generalization and execution.

Findings

01

Enhanced success rates in diverse tasks and scenes

02

Effective retrieval and application of demonstration-derived knowledge

03

Improved robot generalization to new objects and tasks

Abstract

Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which poses obstacles to generalization to diverse tasks or object instances. In this paper, we propose a different perspective, considering human demonstration videos not as mere instructions, but as a source of knowledge for robots. Motivated by this perspective and the remarkable comprehension and generalization capabilities exhibited by large language models (LLMs), we propose DigKnow, a method that DIstills Generalizable KNOWledge with a hierarchical structure. Specifically, DigKnow begins by converting human demonstration video frames into observation knowledge. This knowledge is then subjected to analysis to extract human action knowledge and further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning