The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Minimal 3D Knowledge
Haoru Wang, Kai Ye, Minghan Qin, Yangyan Li, Wenzheng Chen, Baoquan Chen

TL;DR
This paper demonstrates that less dependence on explicit 3D knowledge in novel view synthesis leads to better scalability and performance, enabling a pose-free, data-centric approach that outperforms traditional methods.
Contribution
The authors introduce a scalable, pose-free NVS framework that learns implicit 3D understanding from large-scale 2D images, surpassing methods relying on explicit 3D data.
Findings
Performance of less-dependent methods improves more with increased data.
Our pose-free model achieves state-of-the-art results without pose annotations.
Explicit 3D knowledge reliance limits scalability and performance.
Abstract
Recent advances in feed-forward Novel View Synthesis (NVS) have led to a divergence between two design philosophies: bias-driven methods, which rely on explicit 3D knowledge, such as handcrafted 3D representations (e.g., NeRF and 3DGS) and camera poses annotated by Structure-from-Motion algorithms, and data-centric methods, which learn to understand 3D structure implicitly from large-scale imagery data. This raises a fundamental question: which paradigm is more scalable in an era of ever-increasing data availability? In this work, we conduct a comprehensive analysis of existing methods and uncover a critical trend that the performance of methods requiring less 3D knowledge accelerates more as training data increases, eventually outperforming their 3D knowledge-driven counterparts, which we term "the less you depend, the more you learn." Guided by this finding, we design a feed-forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
