Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation
Weining Ren, Hongjun Wang, Xiao Tan, Kai Han

TL;DR
Fin3R introduces a lightweight fine-tuning approach for 3D reconstruction models that enhances geometric detail and robustness by distilling knowledge from monocular teacher models without increasing inference costs.
Contribution
The paper proposes a novel fine-tuning method that enriches 3D reconstruction models with geometric details via monocular knowledge distillation, improving accuracy and detail recovery.
Findings
Models achieve sharper boundaries and complex structures.
Fine-tuning improves geometric accuracy in various settings.
Memory and latency remain unchanged during inference.
Abstract
We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. The family of feed-forward reconstruction model regresses pointmap of all input images to a reference frame coordinate system, along with other auxiliary outputs, in a single forward pass. However, we find that current models struggle with fine geometry and robustness due to (\textit{i}) the scarcity of high-fidelity depth and pose supervision and (\textit{ii}) the inherent geometric misalignment from multi-view pointmap regression. Fin3R jointly tackles two issues with an extra lightweight fine-tuning step. We freeze the decoder, which handles view matching, and fine-tune only the image encoder-the component dedicated to feature extraction. The encoder is enriched with fine geometric details distilled from a strong monocular teacher model on large, unlabeled datasets, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
