Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models

Mohamad Al Mdfaa; Raghad Salameh; Geesara Kulathunga; Sergey Zagoruyko; Gonzalo Ferrer

arXiv:2405.02162·cs.CV·February 4, 2026

Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models

Mohamad Al Mdfaa, Raghad Salameh, Geesara Kulathunga, Sergey Zagoruyko, Gonzalo Ferrer

PDF

TL;DR

This paper introduces UPPM, a novel panoptic mapping method that uses foundation models to produce a unified, open-vocabulary, and promptable map with improved accuracy and semantic consistency, without additional training.

Contribution

The paper presents a new approach that integrates foundation models into panoptic mapping, enabling open-vocabulary, dynamic labeling with geometric priors and without extra training.

Findings

01

UPPM achieves superior map reconstruction accuracy.

02

UPPM provides high-quality panoptic segmentation.

03

Ablation studies highlight the importance of each component.

Abstract

Panoptic maps enable robots to reason about both geometry and semantics. However, open-vocabulary models repeatedly produce closely related labels that split panoptic entities and degrade volumetric consistency. The proposed UPPM advances open-world scene understanding by leveraging foundation models to introduce a panoptic Dynamic Descriptor that reconciles open-vocabulary labels with unified category structure and geometric size priors. The fusion for such dynamic descriptors is performed within a multi-resolution multi-TSDF map using language-guided open-vocabulary panoptic segmentation and semantic retrieval, resulting in a persistent and promptable panoptic map without additional model training. Based on our evaluation experiments, UPPM shows the best overall performance in terms of the map reconstruction accuracy and the panoptic segmentation quality. The ablation study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.