Playing hard exploration games by watching YouTube
Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang,, Nando de Freitas

TL;DR
This paper introduces a novel reinforcement learning approach that uses unaligned YouTube videos to guide exploration in sparse reward environments, enabling agents to outperform humans on challenging games without environment rewards.
Contribution
The method leverages self-supervised learning on unaligned videos to create a reward function from YouTube footage, bypassing the need for environment-specific demonstrations.
Findings
Agents surpass human performance on Montezuma's Revenge, Pitfall!, and Private Eye.
The approach works without access to environment rewards or aligned demonstrations.
It demonstrates effective one-shot imitation from unaligned, noisy videos.
Abstract
Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications
