Pre-Trained Masked Image Model for Mobile Robot Navigation
Vishnu Dutt Sharma, Anukriti Singh, Pratap Tokekar

TL;DR
This paper demonstrates that pre-trained Masked Autoencoders can be effectively used for various mobile robot navigation tasks, such as map expansion and exploration, without additional fine-tuning, highlighting the potential of foundational vision models in robotics.
Contribution
The study shows that existing pre-trained vision networks can be directly applied to robot navigation tasks, eliminating the need for task-specific training datasets.
Findings
Pre-trained Masked Autoencoders improve field-of-view in navigation.
Effective in single-agent topological exploration.
Applicable to multi-agent indoor mapping.
Abstract
2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
