DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Chieh Hubert Lin; Zhaoyang Lv; Songyin Wu; Zhen Xu; Thu Nguyen-Phuoc; Hung-Yu Tseng; Julian Straub; Numair Khan; Lei Xiao; Ming-Hsuan Yang; Yuheng Ren; Richard Newcombe; Zhao Dong; Zhengqin Li

arXiv:2506.09997·cs.GR·June 12, 2025

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li

PDF

Open Access

TL;DR

DGS-LRM is a real-time feed-forward model that reconstructs deformable 3D scenes from monocular videos, enabling dynamic scene understanding and long-range 3D tracking with high accuracy.

Contribution

The paper introduces a novel deformable 3D Gaussian representation, an enhanced synthetic dataset, and a transformer-based model for real-time dynamic scene reconstruction from monocular videos.

Findings

01

Achieves reconstruction quality comparable to optimization-based methods.

02

Outperforms existing predictive methods on real-world dynamic scenes.

03

Enables accurate long-range 3D tracking with physically grounded deformation.

Abstract

We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization

MethodsSoftmax · Attention Is All You Need