Semantic Visual Navigation by Watching YouTube Videos

Matthew Chang; Arjun Gupta; Saurabh Gupta

arXiv:2006.10034·cs.CV·October 28, 2020·23 cites

Semantic Visual Navigation by Watching YouTube Videos

Matthew Chang, Arjun Gupta, Saurabh Gupta

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method for semantic visual navigation using passive YouTube videos, enabling agents to learn meaningful cues for object-oriented navigation without explicit labels or optimal demonstrations.

Contribution

It presents a novel off-policy Q-learning approach on pseudo-labeled data from passive videos, improving navigation efficiency in simulated environments.

Findings

01

Achieved 15-83% improvement over baseline methods

02

Learned semantic cues from unlabeled passive videos

03

Enhanced navigation efficiency with minimal interaction

Abstract

Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We observe a relative improvement of 15-83% over end-to-end RL, behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MatthewChang/video-dqn
pytorchOfficial

Videos

Semantic Visual Navigation by Watching YouTube Videos· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsQ-Learning