Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM

Laksh Nanwani; Kumaraditya Gupta; Aditya Mathur; Swayam Agrawal; A.H. Abdul Hafez; K. Madhava Krishna

arXiv:2404.17922·cs.CV·October 28, 2025

Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM

Laksh Nanwani, Kumaraditya Gupta, Aditya Mathur, Swayam Agrawal, A.H. Abdul Hafez, K. Madhava Krishna

PDF

1 Repo

TL;DR

This paper introduces O3D-SIM, a 3D semantic instance mapping method that enhances vision-language navigation by integrating instance-level embeddings into 3D point clouds, improving task success rates and object identification.

Contribution

It extends previous 2D instance-level semantic mapping to 3D, leveraging foundational models for robust, detailed environment understanding in navigation tasks.

Findings

01

Improved success rate in language-guided navigation tasks.

02

Enhanced ability to identify objects beyond closed-set limitations.

03

Qualitative improvements in instance clarity and semantic understanding.

Abstract

Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work, SI Maps (Nanwani L, Agarwal A, Jain K, et al. Instance-level semantic maps for vision language navigation. In: 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE; 2023 Aug.), showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Smart-Wheelchair-RRC/o3d-sim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.