INSIGHT: Indoor Scene Intelligence from Geometric-Semantic Hierarchy Transfer for Public~Safety

Alexander Nikitas Dimopoulos; Joseph Grasso; John Beltz

arXiv:2604.23095·cs.CV·April 28, 2026

INSIGHT: Indoor Scene Intelligence from Geometric-Semantic Hierarchy Transfer for Public~Safety

Alexander Nikitas Dimopoulos, Joseph Grasso, John Beltz

PDF

TL;DR

INSIGHT introduces a zero-annotation pipeline that converts 2D image understanding into 3D scene graphs for indoor safety, addressing data scarcity and small feature recognition issues.

Contribution

It presents a novel 2D-to-3D semantic transfer method with interchangeable vision stacks, enabling efficient indoor scene understanding for public safety applications.

Findings

01

Achieves high per-point labeling accuracy on safety classes.

02

Detects safety-critical features absent in public benchmarks.

03

Provides compact, deployable scene graphs with fast transmission.

Abstract

Indoor environments lack the spatial intelligence infrastructure that GPS provides outdoors; first responders arriving at unfamiliar buildings typically have no machine-readable map of safety equipment. Prior work on 3D semantic segmentation for public safety identified two barriers: scarcity of labeled indoor training data and poor recognition of small safety-critical features by native point-cloud methods. This paper presents INSIGHT, a zero-target-domain-annotation pipeline that projects 2D image understanding into 3D metric space via registered RGB-D data. Two interchangeable vision stacks share a common 3D back end: a SAM3 foundation-model stack for text-prompted segmentation, and a traditional CV stack (open-set detection, VQA, OCR) whose intermediate outputs are independently inspectable. Evaluated on all seven subareas of Stanford 2D-3D-S (70{,}496 images), the pipeline produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.