Playing hard exploration games by watching YouTube

Yusuf Aytar; Tobias Pfaff; David Budden; Tom Le Paine; Ziyu Wang,; Nando de Freitas

arXiv:1805.11592·cs.LG·December 3, 2018·6 cites

Playing hard exploration games by watching YouTube

Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang,, Nando de Freitas

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel reinforcement learning approach that uses unaligned YouTube videos to guide exploration in sparse reward environments, enabling agents to outperform humans on challenging games without environment rewards.

Contribution

The method leverages self-supervised learning on unaligned videos to create a reward function from YouTube footage, bypassing the need for environment-specific demonstrations.

Findings

01

Agents surpass human performance on Montezuma's Revenge, Pitfall!, and Private Eye.

02

The approach works without access to environment rewards or aligned demonstrations.

03

It demonstrates effective one-shot imitation from unaligned, noisy videos.

Abstract

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MaxSobolMark/HardRLWithYoutube
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications