Visual Semantic Navigation using Scene Priors

Wei Yang; Xiaolong Wang; Ali Farhadi; Abhinav Gupta; Roozbeh Mottaghi

arXiv:1810.06543·cs.CV·October 16, 2018·37 cites

Visual Semantic Navigation using Scene Priors

Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method that integrates semantic scene priors into deep reinforcement learning for navigation, significantly improving performance and generalization in unseen environments.

Contribution

It proposes using Graph Convolutional Networks to incorporate semantic priors into navigation agents, enhancing their ability to generalize to new scenes and objects.

Findings

01

Semantic knowledge improves navigation performance.

02

Method generalizes well to unseen scenes and objects.

03

Significant performance gains demonstrated in AI2-THOR.

Abstract

How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we search cabinets near the coffee machine and for fruits we try the fridge. In this work, we focus on incorporating semantic priors in the task of semantic navigation. We propose to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework. The agent uses the features from the knowledge graph to predict the actions. For evaluation, we use the AI2-THOR framework. Our experiments show how semantic knowledge improves performance significantly. More importantly, we show improvement in generalization to unseen scenes and/or objects. The supplementary video can be accessed at the following link: https://youtu.be/otKjuO805dE .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

barmayo/spatial_attention
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques

MethodsGraph Convolutional Networks