Camera-Aware Cross-View Alignment for Referring 3D Gaussian Splatting Segmentation
Yuwen Tao, Kanglei Zhou, Xin Tan, Yuan Xie

TL;DR
This paper introduces CaRF, a novel camera-aware framework for 3D Gaussian splatting segmentation that enhances view consistency and grounding accuracy in language-based 3D scene understanding.
Contribution
CaRF incorporates camera geometry into Gaussian interactions and aligns cross-view responses, enabling geometry-aware, view-consistent reasoning in 3D segmentation.
Findings
Achieves state-of-the-art performance on three benchmarks.
Improves mIoU by up to 16.8% over previous methods.
Demonstrates effective cross-view alignment and reasoning.
Abstract
Referring 3D Gaussian Splatting Segmentation (R3DGS) aims to ground free-form language queries in 3D Gaussian fields. However, existing methods rely on single-view pseudo supervision, leading to viewpoint drift and inconsistent predictions across views. We propose CaRF (Camera-aware Referring Field), a camera-aware cross-view alignment framework for view-consistent referring in 3D Gaussian splatting. CaRF introduces Camera-conditioned Alignment Modulation (CAM) to inject camera geometry into Gaussian-text interactions, and Gaussian-level Cross-view Logit Alignment (GCLA) to explicitly align referring responses of the same Gaussians across calibrated views during training. By turning cross-view discrepancy into an optimizable objective, CaRF enables geometry-aware and view-consistent reasoning directly in the Gaussian space. Extensive experiments on three benchmarks demonstrate that CaRF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · 3D Shape Modeling and Analysis
