Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video

Zihui Gao; Ke Liu; Donny Y. Chen; Duochao Shi; Guosheng Lin; Hao Chen; Chunhua Shen

arXiv:2602.07891·cs.CV·February 10, 2026

Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video

Zihui Gao, Ke Liu, Donny Y. Chen, Duochao Shi, Guosheng Lin, Hao Chen, Chunhua Shen

PDF

Open Access

TL;DR

This paper introduces SAGE, a novel framework that leverages Internet videos with weak supervision to scale and improve 3D geometric foundation models, significantly enhancing their generalization capabilities.

Contribution

SAGE is the first scalable method to adapt 3D geometric models from raw internet videos using hierarchical mining and hybrid supervision techniques.

Findings

01

Reduces Chamfer Distance by 20-42% on benchmarks

02

Improves zero-shot generalization of 3D models

03

Establishes a scalable paradigm for 3D learning from videos

Abstract

Geometric foundation models show promise in 3D reconstruction, yet their progress is severely constrained by the scarcity of diverse, large-scale 3D annotations. While Internet videos offer virtually unlimited raw data, utilizing them as a scaling source for geometric learning is challenging due to the absence of ground-truth geometry and the presence of observational noise. To address this, we propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams. SAGE leverages a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision: (1) Informative training trajectory selection; (2) Sparse Geometric Anchoring via SfM point clouds for global structural guidance; and (3) Dense Differentiable Consistency via 3D Gaussian rendering for multi-view constraints. To prevent catastrophic forgetting, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization