ProCap: Projection-Aware Captioning for Spatial Augmented Reality

Zimo Cao; Yuchen Deng; Haibin Ling; Bingyao Huang

arXiv:2604.00912·cs.CV·April 10, 2026

ProCap: Projection-Aware Captioning for Spatial Augmented Reality

Zimo Cao, Yuchen Deng, Haibin Ling, Bingyao Huang

PDF

1 Repo

TL;DR

ProCap introduces a framework for semantic decoupling of physical scenes and projected content in spatial augmented reality, enhancing scene understanding and interaction.

Contribution

It presents a novel two-stage segmentation and retrieval pipeline along with a large-scale SAR dataset and evaluation protocol for improved semantic reasoning.

Findings

01

ProCap effectively isolates virtual and physical layers in SAR scenes.

02

The RGBP dataset provides extensive annotations for SAR semantic understanding.

03

Experimental results demonstrate improved scene and projection comprehension.

Abstract

Spatial augmented reality (SAR) directly projects digital content onto physical scenes using projectors, creating immersive experience without head-mounted displays. However, for SAR to support intelligent interaction, such as reasoning about the scene or answering user queries, it must semantically distinguish between the physical scene and the projected content. Standard Vision Language Models (VLMs) struggle with this virtual-physical ambiguity, often confusing the two contexts. To address this issue, we introduce ProCap, a novel framework that explicitly decouples projected content from physical scenes. ProCap employs a two-stage pipeline: first it visually isolates virtual and physical layers via automated segmentation; then it uses region-aware retrieval to avoid ambiguous semantic context due to projection distortion. To support this, we present RGBP (RGB + Projections), the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ZimoCao.github.io/ProCap
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.