TL;DR
NAS3R is a self-supervised framework that jointly learns 3D geometry and camera parameters from 2D images without annotations, using a novel transformer-based approach and Gaussian formulation.
Contribution
It introduces NAS3R, a novel self-supervised 3D reconstruction method that integrates explicit geometry and camera prediction without ground-truth data.
Findings
NAS3R outperforms existing self-supervised methods in 3D reconstruction quality.
The framework effectively reconstructs 3D from uncalibrated, unposed images.
NAS3R is compatible with supervised architectures and can incorporate priors.
Abstract
In this paper, we introduce NAS3R, a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. During training, NAS3R reconstructs 3D Gaussians from uncalibrated and unposed context views and renders target views using its self-predicted camera parameters, enabling self-supervised training from 2D photometric supervision. To ensure stable convergence, NAS3R integrates reconstruction and camera prediction within a shared transformer backbone regulated by masked attention, and adopts a depth-based Gaussian formulation that facilitates well-conditioned optimization. The framework is compatible with state-of-the-art supervised 3D reconstruction architectures and can incorporate pretrained priors or intrinsic information when available. Extensive experiments show that NAS3R achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
