Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation
Marco Rosano, Antonino Furnari, Luigi Gulino, Corrado Santoro,, Giovanni Maria Farinella

TL;DR
This paper introduces a benchmark for deep learning models using mid-level visual representations to improve real-world indoor navigation, demonstrating the benefits of multi-modal inputs and a validation tool for efficient transfer from simulation to real robots.
Contribution
It proposes a comprehensive benchmark for combining mid-level visual representations in deep learning-based navigation and introduces a validation tool for assessing real-world performance.
Findings
Multi-modal input improves navigation performance.
The validation tool accurately estimates real-world navigation success.
Models trained in simulation transfer effectively to real environments.
Abstract
Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world. One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene. But, what are the best representations that facilitate the transfer of a model to the real-world? How can they be combined? In this work we address these issues by proposing a benchmark of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Multimodal Machine Learning Applications · Advanced Vision and Imaging
