TL;DR
This paper introduces a semantic visual navigation approach that uses high-level features like segmentation and detection masks with deep learning to improve autonomous agent navigation in complex environments.
Contribution
It proposes a novel method combining real and synthetic data for training semantic representations and navigation policies without domain adaptation.
Findings
Achieves 54% success rate in target navigation in unexplored environments.
Outperforms non-learning based approaches (46%) and baseline learning methods (28%).
Utilizes off-the-shelf vision models trained on large datasets for effective navigation.
Abstract
What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
