Director: Instance-aware Gaussian Splatting for Dynamic Scene Modeling and Understanding
Yuheng Jiang, Yiwen Cai, Zihao Wang, Yize Wu, Sicheng Li, Zhuo Su, Shaohui Jiao, Lan Xu

TL;DR
Director introduces a unified 4D Gaussian representation for dynamic scene modeling that incorporates instance-level semantics and language alignment, improving scene decomposition and understanding.
Contribution
It presents a novel spatio-temporal Gaussian model that integrates semantics and language supervision for stable, instance-aware dynamic scene reconstruction.
Findings
Achieves temporally coherent 4D scene reconstructions.
Enables instance segmentation and open-vocabulary querying.
Reduces drift and enhances temporal stability in dynamic scenes.
Abstract
Volumetric video seeks to model dynamic scenes as temporally coherent 4D representations. While recent Gaussian-based approaches achieve impressive rendering fidelity, they primarily emphasize appearance but are largely agnostic to instance-level structure, limiting stable tracking and semantic reasoning in highly dynamic scenarios. In this paper, we present Director, a unified spatio-temporal Gaussian representation that jointly models human performance, high-fidelity rendering, and instance-level semantics. Our key insight is that embedding instance-consistent semantics naturally complements 4D modeling, enabling more accurate scene decomposition while supporting robust dynamic scene understanding. To this end, we leverage temporally aligned instance masks and sentence embeddings derived from Multimodal Large Language Models to supervise the learnable semantic features of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
