Online Embedding Multi-Scale CLIP Features into 3D Maps

Shun Taguchi; Hideki Deguchi

arXiv:2403.18178·cs.RO·March 28, 2024·1 cites

Online Embedding Multi-Scale CLIP Features into 3D Maps

Shun Taguchi, Hideki Deguchi

PDF

Open Access

TL;DR

This paper presents a real-time method for embedding multi-scale CLIP features into 3D maps, enabling semantic mapping and object search in unfamiliar environments, validated through simulations and robot experiments.

Contribution

It introduces an efficient online embedding technique for multi-scale CLIP features into 3D maps, facilitating real-time semantic mapping and zero-shot object navigation.

Findings

01

Faster performance than state-of-the-art mapping methods

02

Higher success rate in object-goal navigation tasks

03

Effective in both simulated and real robot environments

Abstract

This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction

MethodsContrastive Language-Image Pre-training