VFM-Recon: Unlocking Cross-Domain Scene-Level Neural Reconstruction with Scale-Aligned Foundation Priors
Yuhang Ming, Tingkang Xi, Xingrui Yang, Lixin Yang, Yong Peng, Cewu Lu, Wanzeng Kong

TL;DR
VFM-Recon introduces a novel method that combines vision foundation model priors with scale alignment to improve scene-level neural reconstruction across diverse datasets, including challenging outdoor environments.
Contribution
It is the first to integrate transferable VFM priors with scale consistency in neural scene reconstruction, enhancing cross-domain robustness and accuracy.
Findings
Achieves state-of-the-art performance on multiple datasets.
Substantially outperforms previous methods on outdoor datasets.
Successfully maintains scale coherence across views.
Abstract
Scene-level neural volumetric reconstruction from monocular videos remains challenging, especially under severe domain shifts. Although recent advances in vision foundation models (VFMs) provide transferable generalized priors learned from large-scale data, their scaleambiguous predictions are incompatible with the scale consistency required by volumetric fusion. To address this gap, we present VFMRecon, the first attempt to bridge transferable VFM priors with scaleconsistent requirements in scene-level neural reconstruction. Specifically, we first introduce a lightweight scale alignment stage that restores multiview scale coherence. We then integrate pretrained VFM features into the neural volumetric reconstruction pipeline via lightweight task-specific adapters, which are trained for reconstruction while preserving the crossdomain robustness of pretrained representations. We train our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
