Improving Target-driven Visual Navigation with Attention on 3D Spatial   Relationships

Yunlian Lv; Ning Xie; Yimin Shi; Zijiao Wang; and Heng Tao Shen

arXiv:2005.02153·cs.CV·May 6, 2020·21 cites

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, and Heng Tao Shen

PDF

Open Access

TL;DR

This paper enhances target-driven visual navigation in 3D environments by integrating attention on 3D knowledge graphs and a target skill extension module within deep reinforcement learning, improving efficiency, obstacle avoidance, and generalization.

Contribution

It introduces a novel approach combining 3D spatial knowledge graphs and sub-target learning to improve navigation performance and generalization in embodied AI tasks.

Findings

01

Outperforms baseline methods in SR and SPL metrics

02

Improves generalization across unseen targets and scenes

03

Effectively incorporates 3D spatial relationships and sub-targets

Abstract

Embodied artificial intelligence (AI) tasks shift from tasks focusing on internet images to active settings involving embodied agents that perceive and act within 3D environments. In this paper, we investigate the target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes, whose navigation task aims to train an agent that can intelligently make a series of decisions to arrive at a pre-specified target location from any possible starting positions only based on egocentric views. However, most navigation methods currently struggle against several challenging problems, such as data efficiency, automatic obstacle avoidance, and generalization. Generalization problem means that agent does not have the ability to transfer navigation skills learned from previous experience to unseen targets and scenes. To address these issues, we incorporate two designs into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Reinforcement Learning in Robotics