TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction

Fengyi Zhang; Tianjun Zhang; Kasra Khosoussi; Zheng Zhang; Zi Huang; Yadan Luo

arXiv:2512.02341·cs.CV·March 19, 2026

TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction

Fengyi Zhang, Tianjun Zhang, Kasra Khosoussi, Zheng Zhang, Zi Huang, Yadan Luo

PDF

Open Access

TL;DR

This paper introduces TALO, a novel alignment framework for 3D vision foundation models that enhances temporal consistency and robustness in online 3D reconstruction tasks across various models and camera setups.

Contribution

We propose a higher-DOF, long-term alignment method using Thin Plate Spline and a point-agnostic registration approach, improving consistency and robustness in online 3D reconstruction.

Findings

01

Achieves more coherent geometry and lower trajectory errors

02

Demonstrates robustness across multiple datasets and models

03

Compatible with diverse camera configurations

Abstract

3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass. However, when deployed in online settings such as driving scenarios, predictions are made over temporal windows, making it non-trivial to maintain consistency across time. Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry. In this work, we propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies. In addition, we adopt a point-agnostic submap registration design that is inherently robust to noisy geometry predictions. The proposed framework is fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques