Generalizable task representation learning from human demonstration videos: a geometric approach
Jun Jin, Martin Jagersand

TL;DR
This paper introduces a geometric approach to learn generalizable task representations from human demonstration videos, enabling robots to perform tasks with different objects without additional training.
Contribution
It proposes CoVGS-IL, a graph-structured task function that encodes task geometry, allowing transfer to robot controllers without extra robot training or pre-recorded motions.
Findings
Enables task generalization across categorical objects.
Transfers learned representations to robot controllers via uncalibrated visual servoing.
Eliminates the need for extra robot training or pre-recorded motions.
Abstract
We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions. Given a set of human demonstration videos showing a task with different objects/tools (categorical objects), we aim to learn a representation of visual observation that generalizes to categorical objects and enables efficient controller design. We propose to introduce a geometric task structure to the representation learning problem that geometrically encodes the task specification from human demonstration videos, and that enables generalization by building task specification correspondence between categorical objects. Specifically, we propose CoVGS-IL, which uses a graph-structured task function to learn task representations under structural constraints. Our method enables task generalization by selecting geometric features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
