Context-Based Visual-Language Place Recognition
Soojin Woo, Seong-Woo Kim

TL;DR
This paper introduces a zero-shot, language-driven semantic segmentation method for visual place recognition that is robust to scene changes and does not require additional training, outperforming existing techniques.
Contribution
The authors propose a novel VPR approach using pixel-level embeddings from a zero-shot semantic segmentation model, eliminating the need for training and improving robustness to scene variations.
Findings
Outperforms non-learned image representations
Outperforms off-the-shelf CNN descriptors
Effective in challenging real-world scenarios
Abstract
In vision-based robot localization and SLAM, Visual Place Recognition (VPR) is essential. This paper addresses the problem of VPR, which involves accurately recognizing the location corresponding to a given query image. A popular approach to vision-based place recognition relies on low-level visual features. Despite significant progress in recent years, place recognition based on low-level visual features is challenging when there are changes in scene appearance. To address this, end-to-end training approaches have been proposed to overcome the limitations of hand-crafted features. However, these approaches still fail under drastic changes and require large amounts of labeled data to train models, presenting a significant limitation. Methods that leverage high-level semantic information, such as objects or categories, have been proposed to handle variations in appearance. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Speech and dialogue systems
