Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos
Can Li, Jie Gu, Jingmin Chen, Fangzhou Qiu, Lei Sun

TL;DR
This paper introduces a novel multi-scale Gaussian sequence representation for 4D scene reconstruction from monocular casual videos, leveraging multi-scale dynamics and vision priors to improve accuracy and consistency.
Contribution
It proposes a new layered Gaussian sequence model with multi-scale dynamics and integrates vision foundation model priors for enhanced 4D reconstruction from monocular videos.
Findings
Significantly improves 4D reconstruction accuracy.
Achieves more physically plausible dynamic scene modeling.
Demonstrates superior results on benchmark and real-world datasets.
Abstract
Understanding dynamic scenes from casual videos is critical for scalable robot learning, yet four-dimensional (4D) reconstruction under strictly monocular settings remains highly ill-posed. To address this challenge, our key insight is that real-world dynamics exhibits a multi-scale regularity from object to particle level. To this end, we design the multi-scale dynamics mechanism that factorizes complex motion fields. Within this formulation, we propose Gaussian sequences with multi-scale dynamics, a novel representation for dynamic 3D Gaussians derived through compositions of multi-level motion. This layered structure substantially alleviates ambiguity of reconstruction and promotes physically plausible dynamics. We further incorporate multi-modal priors from vision foundation models to establish complementary supervision, constraining the solution space and improving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Robot Manipulation and Learning
