Limitations of NERF with pre-trained Vision Features for Few-Shot 3D Reconstruction
Ankit Sanjyal

TL;DR
This paper systematically evaluates the use of pre-trained vision features in NeRF for few-shot 3D reconstruction, finding that these features underperform compared to baseline models and may introduce biases.
Contribution
It provides a comprehensive evaluation showing that pre-trained vision features like DINO do not improve and may hinder few-shot NeRF-based 3D reconstruction.
Findings
DINO-enhanced NeRF models perform worse than baseline NeRF.
Pre-trained vision features may introduce harmful biases.
Simpler geometric models may outperform feature-based approaches.
Abstract
Neural Radiance Fields (NeRF) have revolutionized 3D scene reconstruction from sparse image collections. Recent work has explored integrating pre-trained vision features, particularly from DINO, to enhance few-shot reconstruction capabilities. However, the effectiveness of such approaches remains unclear, especially in extreme few-shot scenarios. In this paper, we present a systematic evaluation of DINO-enhanced NeRF models, comparing baseline NeRF, frozen DINO features, LoRA fine-tuned features, and multi-scale feature fusion. Surprisingly, our experiments reveal that all DINO variants perform worse than the baseline NeRF, achieving PSNR values around 12.9 to 13.0 compared to the baseline's 14.71. This counterintuitive result suggests that pre-trained vision features may not be beneficial for few-shot 3D reconstruction and may even introduce harmful biases. We analyze potential causes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
