TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos
Jinxi Li, Ziyang Song, Bo Yang

TL;DR
TRACE is a novel framework that models 3D scene dynamics from multi-view videos by learning physical parameters of particles, enabling accurate future frame prediction and object segmentation without human labels.
Contribution
The paper introduces TRACE, which explicitly learns 3D particle-based physical dynamics directly from videos, surpassing prior methods that require labels or simplified physics models.
Findings
Outperforms baselines in future frame extrapolation on multiple datasets
Enables object segmentation through clustering learned physical parameters
Successfully models complex 3D scene physics without human labels
Abstract
In this paper, we aim to model 3D scene geometry, appearance, and physical information just from dynamic multi-view videos in the absence of any human labels. By leveraging physics-informed losses as soft constraints or integrating simple physics models into neural nets, existing works often fail to learn complex motion physics, or doing so requires additional labels such as object types or masks. We propose a new framework named TRACE to model the motion physics of complex dynamic 3D scenes. The key novelty of our method is that, by formulating each 3D point as a rigid particle with size and orientation in space, we directly learn a translation rotation dynamics system for each particle, explicitly estimating a complete set of physical parameters to govern the particle's motion over time. Extensive experiments on three existing dynamic datasets and one newly created challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
