4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding
Mohamed Rayan Barhdadi, Samir Abdaljalil, Rasul Khanbayov, Erchin Serpedin, Hasan Kurban

TL;DR
This paper introduces 4D Synchronized Fields, a novel Gaussian-based representation that jointly models geometry, motion, and semantics for temporal scene understanding, enabling open-vocabulary queries and interpretable motion analysis.
Contribution
It proposes a unified 4D Gaussian representation that learns object-factored motion and synchronizes language with kinematics during reconstruction, improving interpretability and query capabilities.
Findings
Achieves state-of-the-art PSNR on HyperNeRF with 28.52 dB.
Surpasses previous methods in temporal-state retrieval accuracy and IoU.
Kinematic conditioning significantly improves motion understanding.
Abstract
Current 4D representations decouple geometry, motion, and semantics: reconstruction methods discard interpretable motion structure; language-grounded methods attach semantics after motion is learned, blind to how objects move; and motion-aware methods encode dynamics as opaque per-point residuals without object-level organization. We propose 4D Synchronized Fields, a 4D Gaussian representation that learns object-factored motion in-loop during reconstruction and synchronizes language to the resulting kinematics through a per-object conditioned field. Each Gaussian trajectory is decomposed into shared object motion plus an implicit residual, and a kinematic-conditioned ridge map predicts temporal semantic variation, yielding a single representation in which reconstruction, motion, and semantics are structurally coupled and enabling open-vocabulary temporal queries that retrieve both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Human Pose and Action Recognition
