Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow

Shenhan Qian; Ganlin Zhang; Shangzhe Wu; Daniel Cremers

arXiv:2602.14021·cs.CV·February 17, 2026

Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow

Shenhan Qian, Ganlin Zhang, Shangzhe Wu, Daniel Cremers

PDF

Open Access

TL;DR

Flow4R introduces a unified scene flow-based framework that simultaneously reconstructs and tracks dynamic 3D scenes, outperforming existing methods by integrating geometry and motion estimation in a single model.

Contribution

The paper presents Flow4R, a novel approach that uses scene flow as the core representation, enabling joint 4D reconstruction and tracking without explicit pose estimation or bundle adjustment.

Findings

01

Achieves state-of-the-art results on 4D reconstruction tasks.

02

Effectively handles static and dynamic scenes through joint training.

03

Operates with a single forward pass using a Vision Transformer.

Abstract

Reconstructing and tracking dynamic 3D scenes remains a fundamental challenge in computer vision. Existing approaches often decouple geometry from motion: multi-view reconstruction methods assume static scenes, while dynamic tracking frameworks rely on explicit camera pose estimation or separate motion models. We propose Flow4R, a unified framework that treats camera-space scene flow as the central representation linking 3D structure, object motion, and camera motion. Flow4R predicts a minimal per-pixel property set-3D point position, scene flow, pose weight, and confidence-from two-view inputs using a Vision Transformer. This flow-centric formulation allows local geometry and bidirectional motion to be inferred symmetrically with a shared decoder in a single forward pass, without requiring explicit pose regressors or bundle adjustment. Trained jointly on static and dynamic datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition