Reward Learning from Narrated Demonstrations
Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina, Fragkiadaki

TL;DR
This paper introduces a method for training instructable robotic agents using narrated visual demonstrations, enabling natural language understanding and generalization to new objects and locations.
Contribution
It presents a joint learning framework for natural language grounding and behavioral policies using narrated visual demonstrations, with a new dataset and empirical validation.
Findings
Agents learn visual reward detectors with few examples
Developed pick-and-place policies from learned detectors
Agents generalize to novel objects and locations using natural language instructions
Abstract
Humans effortlessly "program" one another by communicating goals and desires in natural language. In contrast, humans program robotic behaviours by indicating desired object locations and poses to be achieved, by providing RGB images of goal configurations, or supplying a demonstration to be imitated. None of these methods generalize across environment variations, and they convey the goal in awkward technical terms. This work proposes joint learning of natural language grounding and instructable behavioural policies reinforced by perceptual detectors of natural language expressions, grounded to the sensory inputs of the robotic agent. Our supervision is narrated visual demonstrations(NVD), which are visual demonstrations paired with verbal narration (as opposed to being silent). We introduce a dataset of NVD where teachers perform activities while describing them in detail. We map the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
