Video2Skill: Adapting Events in Demonstration Videos to Skills in an   Environment using Cyclic MDP Homomorphisms

Sumedh A Sontakke; Sumegh Roychowdhury; Mausoom Sarkar; Nikaash Puri,; Balaji Krishnamurthy; Laurent Itti

arXiv:2109.03813·cs.AI·September 13, 2021

Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

Sumedh A Sontakke, Sumegh Roychowdhury, Mausoom Sarkar, Nikaash Puri,, Balaji Krishnamurthy, Laurent Itti

PDF

Open Access

TL;DR

Video2Skill introduces a method for robots to learn and adapt skills from human demonstration videos by segmenting events and transferring representations to robotic actions, enabling zero-shot skill generation.

Contribution

The paper presents a novel approach combining sequence-to-sequence auto-encoders and cyclic MDP homomorphisms for event segmentation and skill transfer from videos to robots.

Findings

01

Effective event segmentation in demonstration videos.

02

Successful transfer of learned representations to robotic skills.

03

Improved zero-shot skill generation in robots.

Abstract

Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Human Pose and Action Recognition