Navigating to Objects in the Real World
Theophile Gervet, Soumith Chintala, Dhruv Batra, Jitendra Malik,, Devendra Singh Chaplot

TL;DR
This paper empirically compares classical, modular, and end-to-end learning methods for semantic visual navigation on real robots, highlighting modular learning's robustness and identifying simulation gaps affecting real-world performance.
Contribution
It provides the first large-scale real-world evaluation of semantic navigation methods, demonstrating modular learning's effectiveness and analyzing simulation-to-reality transfer issues.
Findings
Modular learning achieves 90% success rate in real-world navigation.
End-to-end learning drops from 77% in simulation to 23% in reality.
Simulation gaps in images and error modes hinder reliable evaluation.
Abstract
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, while modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. But learned visual navigation policies have predominantly been evaluated in simulation. How well do different classes of methods work on a robot? We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning
