4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding

Mohamed Rayan Barhdadi; Samir Abdaljalil; Rasul Khanbayov; Erchin Serpedin; Hasan Kurban

arXiv:2603.14301·cs.CV·March 17, 2026

4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding

Mohamed Rayan Barhdadi, Samir Abdaljalil, Rasul Khanbayov, Erchin Serpedin, Hasan Kurban

PDF

Open Access

TL;DR

This paper introduces 4D Synchronized Fields, a novel Gaussian-based representation that jointly models geometry, motion, and semantics for temporal scene understanding, enabling open-vocabulary queries and interpretable motion analysis.

Contribution

It proposes a unified 4D Gaussian representation that learns object-factored motion and synchronizes language with kinematics during reconstruction, improving interpretability and query capabilities.

Findings

01

Achieves state-of-the-art PSNR on HyperNeRF with 28.52 dB.

02

Surpasses previous methods in temporal-state retrieval accuracy and IoU.

03

Kinematic conditioning significantly improves motion understanding.

Abstract

Current 4D representations decouple geometry, motion, and semantics: reconstruction methods discard interpretable motion structure; language-grounded methods attach semantics after motion is learned, blind to how objects move; and motion-aware methods encode dynamics as opaque per-point residuals without object-level organization. We propose 4D Synchronized Fields, a 4D Gaussian representation that learns object-factored motion in-loop during reconstruction and synchronizes language to the resulting kinematics through a per-object conditioned field. Each Gaussian trajectory is decomposed into shared object motion plus an implicit residual, and a kinematic-conditioned ridge map predicts temporal semantic variation, yielding a single representation in which reconstruction, motion, and semantics are structurally coupled and enabling open-vocabulary temporal queries that retrieve both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Motion and Animation · Human Pose and Action Recognition