Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

Weining Ren; Hongjun Wang; Xiao Tan; Kai Han

arXiv:2511.22429·cs.CV·December 1, 2025

Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

Weining Ren, Hongjun Wang, Xiao Tan, Kai Han

PDF

Open Access 1 Video

TL;DR

Fin3R introduces a lightweight fine-tuning approach for 3D reconstruction models that enhances geometric detail and robustness by distilling knowledge from monocular teacher models without increasing inference costs.

Contribution

The paper proposes a novel fine-tuning method that enriches 3D reconstruction models with geometric details via monocular knowledge distillation, improving accuracy and detail recovery.

Findings

01

Models achieve sharper boundaries and complex structures.

02

Fine-tuning improves geometric accuracy in various settings.

03

Memory and latency remain unchanged during inference.

Abstract

We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. The family of feed-forward reconstruction model regresses pointmap of all input images to a reference frame coordinate system, along with other auxiliary outputs, in a single forward pass. However, we find that current models struggle with fine geometry and robustness due to (\textit{i}) the scarcity of high-fidelity depth and pose supervision and (\textit{ii}) the inherent geometric misalignment from multi-view pointmap regression. Fin3R jointly tackles two issues with an extra lightweight fine-tuning step. We freeze the decoder, which handles view matching, and fine-tune only the image encoder-the component dedicated to feature extraction. The encoder is enriched with fine geometric details distilled from a strong monocular teacher model on large, unlabeled datasets, using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation· slideslive

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization