INSIGHT: Indoor Scene Intelligence from Geometric-Semantic Hierarchy Transfer for Public~Safety
Alexander Nikitas Dimopoulos, Joseph Grasso, John Beltz

TL;DR
INSIGHT introduces a zero-annotation pipeline that converts 2D image understanding into 3D scene graphs for indoor safety, addressing data scarcity and small feature recognition issues.
Contribution
It presents a novel 2D-to-3D semantic transfer method with interchangeable vision stacks, enabling efficient indoor scene understanding for public safety applications.
Findings
Achieves high per-point labeling accuracy on safety classes.
Detects safety-critical features absent in public benchmarks.
Provides compact, deployable scene graphs with fast transmission.
Abstract
Indoor environments lack the spatial intelligence infrastructure that GPS provides outdoors; first responders arriving at unfamiliar buildings typically have no machine-readable map of safety equipment. Prior work on 3D semantic segmentation for public safety identified two barriers: scarcity of labeled indoor training data and poor recognition of small safety-critical features by native point-cloud methods. This paper presents INSIGHT, a zero-target-domain-annotation pipeline that projects 2D image understanding into 3D metric space via registered RGB-D data. Two interchangeable vision stacks share a common 3D back end: a SAM3 foundation-model stack for text-prompted segmentation, and a traditional CV stack (open-set detection, VQA, OCR) whose intermediate outputs are independently inspectable. Evaluated on all seven subareas of Stanford 2D-3D-S (70{,}496 images), the pipeline produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
