DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping
Felix Igelbrink, Lennart Niecksch, Martin Atzmueller, Joachim Hertzberg

TL;DR
DISC introduces a novel, efficient, and high-fidelity semantic mapping method that leverages dense, integrated context and CLIP embeddings for large-scale, real-time robotic perception in complex environments.
Contribution
The paper presents a single-pass, GPU-accelerated semantic mapping approach that derives high-quality CLIP embeddings from intermediate transformer layers, eliminating crop-based extraction limitations.
Findings
Outperforms state-of-the-art zero-shot methods in accuracy and retrieval.
Demonstrates scalability across large, complex indoor scenes.
Enables real-time, dense semantic mapping for robotic applications.
Abstract
Open-set semantic mapping enables language-driven robotic perception, but current instance-centric approaches are bottlenecked by context-depriving and computationally expensive crop-based feature extraction. To overcome this fundamental limitation, we introduce DISC (Dense Integrated Semantic Context), featuring a novel single-pass, distance-weighted extraction mechanism. By deriving high-fidelity CLIP embeddings directly from the vision transformer's intermediate layers, our approach eliminates the latency and domain-shift artifacts of traditional image cropping, yielding pure, mask-aligned semantic representations. To fully leverage these features in large-scale continuous mapping, DISC is built upon a fully GPU-accelerated architecture that replaces periodic offline processing with precise, on-the-fly voxel-level instance refinement. We evaluate our approach on standard benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization
