Spatio-Temporal Graph Localization Networks for Image-based Navigation
Takahiro Niwa, Shun Taguchi, Noriaki Hirose

TL;DR
This paper introduces a learning-based spatio-temporal graph neural network for image-based robot localization that combines spatial and temporal cues, utilizing sim2real transfer to improve accuracy in environments with similar-looking images.
Contribution
It proposes a novel CNN-GNN architecture with semi-supervised sim2real transfer for improved localization in challenging indoor environments.
Findings
Outperforms state-of-the-art baselines in environments with similar images.
Significantly improves navigation accuracy in both simulation and real-world tests.
Effectively leverages semi-supervised learning with simulator and real images.
Abstract
Localization in topological maps is essential for image-based navigation using an RGB camera. Localization using only one camera can be challenging in medium-to-large-sized environments because similar-looking images are often observed repeatedly, especially in indoor environments. To overcome this issue, we propose a learning-based localization method that simultaneously utilizes the spatial consistency from topological maps and the temporal consistency from time-series images captured by the robot. Our method combines a convolutional neural network (CNN) to embed image features and a recurrent-type graph neural network to perform accurate localization. When training our model, it is difficult to obtain the ground truth pose of the robot when capturing images in real-world environments. Hence, we propose a sim2real transfer approach with semi-supervised learning that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
