IPFormer: Visual 3D Panoptic Scene Completion with Context-Adaptive Instance Proposals
Markus Gross, Aya Fahmy, Danit Niwattananan, Dominik Muhle, Rui Song, Daniel Cremers, Henri Mee{\ss}

TL;DR
IPFormer introduces a novel approach for vision-based 3D Panoptic Scene Completion using context-adaptive instance proposals, significantly improving accuracy, generalization, and efficiency over previous fixed-query Transformer methods.
Contribution
It is the first method to utilize context-adaptive instance proposals for 3D PSC, enabling dynamic query adaptation at train and test time for improved scene understanding.
Findings
Achieves state-of-the-art in-domain performance.
Exhibits superior zero-shot out-of-domain generalization.
Reduces runtime by over 14 times.
Abstract
Semantic Scene Completion (SSC) has emerged as a pivotal approach for jointly learning scene geometry and semantics, enabling downstream applications such as navigation in mobile robotics. The recent generalization to Panoptic Scene Completion (PSC) advances the SSC domain by integrating instance-level information, thereby enhancing object-level sensitivity in scene understanding. While PSC was introduced using LiDAR modality, methods based on camera images remain largely unexplored. Moreover, recent Transformer-based approaches utilize a fixed set of learned queries to reconstruct objects within the scene volume. Although these queries are typically updated with image context during training, they remain static at test time, limiting their ability to dynamically adapt specifically to the observed scene. To overcome these limitations, we propose IPFormer, the first method that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training
