Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment

Youming Deng; Songyou Peng; Junyi Zhang; Kathryn Heal; Tiancheng Sun; John Flynn; Steve Marschner; Lucy Chai

arXiv:2512.08930·cs.CV·December 23, 2025

Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment

Youming Deng, Songyou Peng, Junyi Zhang, Kathryn Heal, Tiancheng Sun, John Flynn, Steve Marschner, Lucy Chai

PDF

Open Access

TL;DR

Selfi enhances 3D reconstruction and view synthesis by aligning features from foundation models, improving geometric consistency and accuracy in pose estimation and novel view synthesis tasks.

Contribution

The paper introduces Selfi, a self-improving pipeline that refines foundation model features through pseudo-ground-truth alignment for better 3D reconstruction.

Findings

01

State-of-the-art results in NVS and pose estimation

02

Improved multi-view geometric consistency

03

Effective feature alignment enhances downstream 3D tasks

Abstract

Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion (SfM) beforehand. Recent vision foundation models like VGGT take an orthogonal approach -- 3D knowledge is gained implicitly through training data and loss objectives, enabling feed-forward prediction of both camera parameters and 3D representations directly from a set of uncalibrated images. While flexible, VGGT features lack explicit multi-view geometric consistency, and we find that improving such 3D feature consistency benefits both NVS and pose estimation tasks. We introduce Selfi, a self-improving 3D reconstruction pipeline via feature alignment, transforming a VGGT backbone into a high-fidelity 3D reconstruction engine by leveraging its own outputs as pseudo-ground-truth. Specifically, we train a lightweight feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis