FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow
Cameron Smith, Yilun Du, Ayush Tewari, Vincent Sitzmann

TL;DR
FlowCam introduces a self-supervised method for reconstructing 3D neural scene representations and camera poses directly from video, eliminating the need for expensive pose estimation techniques and enabling scalable 3D scene learning.
Contribution
The paper presents a novel joint optimization approach that estimates camera poses and neural scene representations simultaneously from video frames using differentiable rendering.
Findings
Robust performance on diverse real-world videos.
Effective pose estimation without external structure-from-motion.
End-to-end training on challenging sequences.
Abstract
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. The key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion, which is prohibitively expensive to run at scale. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass. We estimate poses by first lifting frame-to-frame optical flow to 3D scene flow via differentiable rendering, preserving locality and shift-equivariance of the image processing backbone. SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field. This formulation enables us to jointly supervise pose estimation and a generalizable neural scene representation via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
