TL;DR
ProCap introduces a framework for semantic decoupling of physical scenes and projected content in spatial augmented reality, enhancing scene understanding and interaction.
Contribution
It presents a novel two-stage segmentation and retrieval pipeline along with a large-scale SAR dataset and evaluation protocol for improved semantic reasoning.
Findings
ProCap effectively isolates virtual and physical layers in SAR scenes.
The RGBP dataset provides extensive annotations for SAR semantic understanding.
Experimental results demonstrate improved scene and projection comprehension.
Abstract
Spatial augmented reality (SAR) directly projects digital content onto physical scenes using projectors, creating immersive experience without head-mounted displays. However, for SAR to support intelligent interaction, such as reasoning about the scene or answering user queries, it must semantically distinguish between the physical scene and the projected content. Standard Vision Language Models (VLMs) struggle with this virtual-physical ambiguity, often confusing the two contexts. To address this issue, we introduce ProCap, a novel framework that explicitly decouples projected content from physical scenes. ProCap employs a two-stage pipeline: first it visually isolates virtual and physical layers via automated segmentation; then it uses region-aware retrieval to avoid ambiguous semantic context due to projection distortion. To support this, we present RGBP (RGB + Projections), the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
