Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations
Youyu Chen, Junjun Jiang, Yueru Luo, Kui Jiang, Xianming Liu, Xu Yan, Dave Zhenyu Chen

TL;DR
Reliev3R introduces a weakly-supervised method for training feed-forward reconstruction models without extensive geometric annotations, utilizing monocular cues and pretrained model predictions to achieve competitive results.
Contribution
It presents a novel paradigm that reduces reliance on costly geometric data by leveraging monocular relative depths and image correspondences for training FFRMs.
Findings
Reliev3R achieves comparable performance to fully-supervised models.
The method reduces data and computational requirements for 3D reconstruction.
It demonstrates effective training from scratch without geometric annotations.
Abstract
With recent advances, Feed-forward Reconstruction Models (FFRMs) have demonstrated great potential in reconstruction quality and adaptiveness to multiple downstream tasks. However, the excessive reliance on multi-view geometric annotations, e.g. 3D point maps and camera poses, makes the fully-supervised training scheme of FFRMs difficult to scale up. In this paper, we propose Reliev3R, a weakly-supervised paradigm for training FFRMs from scratch without cost-prohibitive multi-view geometric annotations. Relieving the reliance on geometric sensory data and compute-exhaustive structure-from-motion preprocessing, our method draws 3D knowledge directly from monocular relative depths and image sparse correspondences given by zero-shot predictions of pretrained models. At the core of Reliev3R, we design an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
