Leverage Cross-Attention for End-to-End Open-Vocabulary Panoptic Reconstruction
Xuan Yu, Yuxuan Xie, Yili Liu, Haojian Lu, Rong Xiong, Yiyi Liao, and Yue Wang

TL;DR
PanopticRecon++ introduces a novel end-to-end cross-attention approach with learnable 3D Gaussian queries for open-vocabulary 3D scene reconstruction, improving spatial priors and semantic consistency.
Contribution
It presents a new cross-attention formulation with learnable 3D Gaussian queries that enhance spatial priors and enable open-vocabulary panoptic reconstruction in an end-to-end manner.
Findings
Achieves competitive 3D and 2D segmentation and reconstruction results.
Effectively aligns 2D open-vocabulary instance IDs across frames.
Demonstrates applicability in robotic simulation scenarios.
Abstract
Open-vocabulary panoptic reconstruction offers comprehensive scene understanding, enabling advances in embodied robotics and photorealistic simulation. In this paper, we propose PanopticRecon++, an end-to-end method that formulates panoptic reconstruction through a novel cross-attention perspective. This perspective models the relationship between 3D instances (as queries) and the scene's 3D embedding field (as keys) through their attention map. Unlike existing methods that separate the optimization of queries and keys or overlook spatial proximity, PanopticRecon++ introduces learnable 3D Gaussians as instance queries. This formulation injects 3D spatial priors to preserve proximity while maintaining end-to-end optimizability. Moreover, this query formulation facilitates the alignment of 2D open-vocabulary instance IDs across frames by leveraging optimal linear assignment with instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need
