Open-Vocabulary Online Semantic Mapping for SLAM

Tomas Berriel Martins; Martin R. Oswald; and Javier Civera

arXiv:2411.15043·cs.CV·October 9, 2025

Open-Vocabulary Online Semantic Mapping for SLAM

Tomas Berriel Martins, Martin R. Oswald, and Javier Civera

PDF

1 Repo

TL;DR

This paper introduces OVO, an efficient online 3D semantic mapping system that uses CLIP vectors for segmentation, integrated with SLAM backbones, enabling real-time open-vocabulary mapping with improved performance and lower resource use.

Contribution

The paper presents a novel online 3D semantic mapping pipeline using CLIP vectors and a new merging method, achieving better segmentation and efficiency than offline methods.

Findings

01

Lower computational and memory footprint compared to offline baselines.

02

Superior segmentation metrics over offline and online methods.

03

Successful integration with SLAM backbones for end-to-end online mapping.

Abstract

This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tberriel/ovo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training