QueSTMaps: Queryable Semantic Topological Maps for 3D Scene   Understanding

Yash Mehan; Kumaraditya Gupta; Rohit Jayanti; Anirudh Govil; Sourav; Garg; Madhava Krishna

arXiv:2404.06442·cs.CV·December 13, 2024·2 cites

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

Yash Mehan, Kumaraditya Gupta, Rohit Jayanti, Anirudh Govil, Sourav, Garg, Madhava Krishna

PDF

Open Access

TL;DR

QueSTMaps introduces a novel pipeline for 3D scene understanding that combines topological mapping with semantic labeling, enabling natural language queries and outperforming existing methods in room segmentation and classification.

Contribution

The paper presents a new two-step approach that constructs topological maps and generates CLIP-aligned semantic features for improved 3D scene understanding.

Findings

01

Outperforms state-of-the-art in room segmentation by ~20%

02

Achieves ~12% improvement in room classification

03

Supports natural language queries for scene navigation

Abstract

Robotic tasks such as planning and navigation require a hierarchical semantic understanding of a scene, which could include multiple floors and rooms. Current methods primarily focus on object segmentation for 3D scene understanding. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this work, we introduce a two-step pipeline to solve this problem. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a "place to cook" locates the "kitchen". We outperform the current state-of-the-art on room segmentation by ~20% and room classification by ~12%.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization

MethodsFocus