TL;DR
This paper introduces a neural network architecture for simultaneous camera pose and 3D scene structure recovery from multiple images, leveraging permutation equivariance and unsupervised learning, applicable to both calibrated and uncalibrated settings.
Contribution
It presents a novel deep network that jointly estimates camera parameters and scene structure without initialization, inspired by classical SFM and matrix completion techniques.
Findings
Accurately recovers pose and structure comparable to classical methods.
Works in both calibrated and uncalibrated scenarios.
Pre-trained model adapts to new scenes with fine-tuning.
Abstract
Existing deep methods produce highly accurate 3D reconstructions in stereo and multiview stereo settings, i.e., when cameras are both internally and externally calibrated. Nevertheless, the challenge of simultaneous recovery of camera poses and 3D scene structure in multiview settings with deep networks is still outstanding. Inspired by projective factorization for Structure from Motion (SFM) and by deep matrix completion techniques, we propose a neural network architecture that, given a set of point tracks in multiple images of a static scene, recovers both the camera parameters and a (sparse) scene structure by minimizing an unsupervised reprojection loss. Our network architecture is designed to respect the structure of the problem: the sought output is equivariant to permutations of both cameras and scene points. Notably, our method does not require initialization of camera…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
