Online Embedding Multi-Scale CLIP Features into 3D Maps
Shun Taguchi, Hideki Deguchi

TL;DR
This paper presents a real-time method for embedding multi-scale CLIP features into 3D maps, enabling semantic mapping and object search in unfamiliar environments, validated through simulations and robot experiments.
Contribution
It introduces an efficient online embedding technique for multi-scale CLIP features into 3D maps, facilitating real-time semantic mapping and zero-shot object navigation.
Findings
Faster performance than state-of-the-art mapping methods
Higher success rate in object-goal navigation tasks
Effective in both simulated and real robot environments
Abstract
This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction
MethodsContrastive Language-Image Pre-training
