Visual Geometric Skill Inference by Watching Human Demonstration
Jun Jin, Laura Petrich, Zichen Zhang, Masood Dehghan, Martin, Jagersand

TL;DR
This paper introduces a graph-based kernel regression approach using InMaxEnt IRL to infer geometric manipulation skills from human demonstration videos, enabling human-readable task definitions and robust control error outputs.
Contribution
It presents a novel method that infers geometric associations directly from videos without extensive feature selection, improving generalization and simplicity over traditional approaches.
Findings
Accurately infers geometric associations from a single demonstration
Generalizes well under variance in demonstrations
Eliminates need for tedious feature tracking in visual servoing
Abstract
We study the problem of learning manipulation skills from human demonstration video by inferring the association relationships between geometric features. Motivation for this work stems from the observation that humans perform eye-hand coordination tasks by using geometric primitives to define a task while a geometric control error drives the task through execution. We propose a graph based kernel regression method to directly infer the underlying association constraints from human demonstration video using Incremental Maximum Entropy Inverse Reinforcement Learning (InMaxEnt IRL). The learned skill inference provides human readable task definition and outputs control errors that can be directly plugged into traditional controllers. Our method removes the need for tedious feature selection and robust feature trackers required in traditional approaches (e.g. feature-based visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
