TL;DR
4RC introduces a unified transformer-based framework for 4D reconstruction from monocular videos, enabling flexible querying of dense geometry and motion at any time and view.
Contribution
It proposes a novel encode-once, query-anywhere paradigm that jointly models 4D scene geometry and motion in a compact latent space.
Findings
Outperforms prior methods in various 4D reconstruction tasks.
Efficiently queries 3D geometry and motion for any frame and timestamp.
Represents 4D attributes with minimal factorization for improved learning.
Abstract
We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
