Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Zhongxiao Cong; Qitao Zhao; Minsik Jeon; Shubham Tulsiani

arXiv:2602.20157·cs.CV·February 24, 2026

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Zhongxiao Cong, Qitao Zhao, Minsik Jeon, Shubham Tulsiani

PDF

Open Access

TL;DR

Flow3r introduces a scalable method for visual geometry learning by using factored dense flow prediction from unlabeled videos, significantly improving performance on static and dynamic scene benchmarks.

Contribution

The paper proposes a novel factored flow prediction module that enhances geometry and motion learning from unlabeled videos, extending to dynamic scenes.

Findings

01

Outperforms alternative flow prediction designs.

02

Performance scales with more unlabeled data.

03

Achieves state-of-the-art results on multiple benchmarks.

Abstract

Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynamic real-world scenes. We present Flow3r, a framework that augments visual geometry learning with dense 2D correspondences (`flow') as supervision, enabling scalable training from unlabeled monocular videos. Our key insight is that the flow prediction module should be factored: predicting flow between two images using geometry latents from one and pose latents from the other. This factorization directly guides the learning of both scene geometry and camera motion, and naturally extends to dynamic scenes. In controlled experiments, we show that factored flow prediction outperforms alternative designs and that performance scales consistently with unlabeled data. Integrating factored flow into existing visual geometry architectures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques