TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction
Fengyi Zhang, Tianjun Zhang, Kasra Khosoussi, Zheng Zhang, Zi Huang, Yadan Luo

TL;DR
This paper introduces TALO, a novel alignment framework for 3D vision foundation models that enhances temporal consistency and robustness in online 3D reconstruction tasks across various models and camera setups.
Contribution
We propose a higher-DOF, long-term alignment method using Thin Plate Spline and a point-agnostic registration approach, improving consistency and robustness in online 3D reconstruction.
Findings
Achieves more coherent geometry and lower trajectory errors
Demonstrates robustness across multiple datasets and models
Compatible with diverse camera configurations
Abstract
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass. However, when deployed in online settings such as driving scenarios, predictions are made over temporal windows, making it non-trivial to maintain consistency across time. Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry. In this work, we propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies. In addition, we adopt a point-agnostic submap registration design that is inherently robust to noisy geometry predictions. The proposed framework is fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
