FM-Loc: Using Foundation Models for Improved Vision-based Localization

Reihaneh Mirjalili; Michael Krawez; Wolfram Burgard

arXiv:2304.07058·cs.RO·April 17, 2023·1 cites

FM-Loc: Using Foundation Models for Improved Vision-based Localization

Reihaneh Mirjalili, Michael Krawez, Wolfram Burgard

PDF

Open Access

TL;DR

FM-Loc introduces a novel vision-based localization method leveraging foundation models like CLIP and GPT-3 to create semantic image descriptors, improving robustness to environmental changes without training.

Contribution

The paper presents a new localization approach using foundation models for semantic scene understanding, eliminating the need for training or fine-tuning.

Findings

01

Effective in indoor environments with viewpoint and object placement changes

02

No training or fine-tuning required for the approach

03

Demonstrates robustness in real-world scenarios

Abstract

Visual place recognition is essential for vision-based robot localization and SLAM. Despite the tremendous progress made in recent years, place recognition in changing environments remains challenging. A promising approach to cope with appearance variations is to leverage high-level semantic features like objects or place categories. In this paper, we propose FM-Loc which is a novel image-based localization approach based on Foundation Models that uses the Large Language Model GPT-3 in combination with the Visual-Language Model CLIP to construct a semantic image descriptor that is robust to severe changes in scene geometry and camera viewpoint. We deploy CLIP to detect objects in an image, GPT-3 to suggest potential room labels based on the detected objects, and CLIP again to propose the most likely location label. The object labels and the scene label constitute an image descriptor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications