# Vision-based Navigation Using Deep Reinforcement Learning

**Authors:** Jon\'a\v{s} Kulh\'anek, Erik Derner, Tim de Bruin, Robert, Babu\v{s}ka

arXiv: 1908.03627 · 2019-11-12

## TL;DR

This paper introduces a deep reinforcement learning approach with auxiliary tasks for visual navigation, enabling an agent to effectively navigate to image-specified targets in complex environments like AI2-THOR.

## Contribution

It extends the A2C algorithm with auxiliary tasks for segmentation and depth prediction, improving training efficiency and navigation performance in realistic visual environments.

## Key findings

- Outperforms state-of-the-art methods in AI2-THOR
- Uses auxiliary tasks for better visual understanding
- Achieves efficient training with curriculum environment complexity

## Abstract

Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic environments is a challenging task. We propose a novel learning architecture capable of navigating an agent, e.g. a mobile robot, to a target given by an image. To achieve this, we have extended the batched A2C algorithm with auxiliary tasks designed to improve visual navigation performance. We propose three additional auxiliary tasks: predicting the segmentation of the observation image and of the target image and predicting the depth-map. These tasks enable the use of supervised learning to pre-train a large part of the network and to reduce the number of training steps substantially. The training performance has been further improved by increasing the environment complexity gradually over time. An efficient neural network structure is proposed, which is capable of learning for multiple targets in multiple environments. Our method navigates in continuous state spaces and on the AI2-THOR environment simulator outperforms state-of-the-art goal-oriented visual navigation methods from the literature.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.03627/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1908.03627/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1908.03627/full.md

---
Source: https://tomesphere.com/paper/1908.03627