Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms
Sumedh A Sontakke, Sumegh Roychowdhury, Mausoom Sarkar, Nikaash Puri,, Balaji Krishnamurthy, Laurent Itti

TL;DR
Video2Skill introduces a method for robots to learn and adapt skills from human demonstration videos by segmenting events and transferring representations to robotic actions, enabling zero-shot skill generation.
Contribution
The paper presents a novel approach combining sequence-to-sequence auto-encoders and cyclic MDP homomorphisms for event segmentation and skill transfer from videos to robots.
Findings
Effective event segmentation in demonstration videos.
Successful transfer of learned representations to robotic skills.
Improved zero-shot skill generation in robots.
Abstract
Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Human Pose and Action Recognition
