Learning 3D Reconstruction with Priors in Test Time
Lei Zhou, Haoyu Wu, Akshat Dave, Dimitris Samaras

TL;DR
This paper presents a test-time optimization framework for multiview Transformers that leverages priors to enhance 3D reconstruction tasks without retraining, significantly improving accuracy across benchmarks.
Contribution
The authors introduce a novel test-time constrained optimization approach that incorporates priors as penalties, boosting 3D vision performance without modifying pre-trained models.
Findings
Reduces point-map error by over 50% on multiple datasets.
Outperforms retrained prior-aware feed-forward methods.
Consistently improves performance across diverse 3D benchmarks.
Abstract
We introduce a test-time framework for multiview Transformers (MVTs) that incorporates priors (e.g., camera poses, intrinsics, and depth) to improve 3D tasks without retraining or modifying pre-trained image-only networks. Rather than feeding priors into the architecture, we cast them as constraints on the predictions and optimize the network at inference time. The optimization loss consists of a self-supervised objective and prior penalty terms. The self-supervised objective captures the compatibility among multi-view predictions and is implemented using photometric or geometric loss between renderings from other views and each view itself. Any available priors are converted into penalty terms on the corresponding output modalities. Across a series of 3D vision benchmarks, including point map estimation and camera pose estimation, our method consistently improves performance over base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
