Object-Centric Multi-View Aggregation
Shubham Tulsiani, Or Litany, Charles R. Qi, He Wang, Leonidas J., Guibas

TL;DR
This paper introduces an object-centric multi-view aggregation method that constructs a semi-implicit 3D representation without explicit camera pose estimation, improving robustness and flexibility in 3D inference tasks.
Contribution
The approach enables view aggregation into a canonical coordinate system without pose estimation, handling variable views and order independence, with symmetry-aware mapping for better unseen region propagation.
Findings
Improved volumetric reconstruction quality.
Enhanced novel view synthesis robustness.
Robustness to pose ambiguities during inference.
Abstract
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid. Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation, and then combined -- in a manner that can accommodate a variable number of views and is view order independent. We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions, as well as to robustly overcome pose ambiguities during inference. Our aggregate representation enables us to perform 3D inference tasks like volumetric reconstruction and novel view synthesis, and we use these tasks to demonstrate the benefits of our aggregation approach as compared to implicit or camera-centric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
