STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering
Zehao Li, Hao Jiang, Yujun Cai, Jianing Chen, Baolong Bi, Shuqin Gao, Honglong Zhao, Yiwei Wang, Tianlu Mao, Zhaoqi Wang

TL;DR
This paper introduces STDR, a module that decouples spatial and temporal information in 3D Gaussian Splatting, significantly improving real-time dynamic scene reconstruction quality and consistency.
Contribution
The paper proposes a novel spatio-temporal decoupling module for 3D Gaussian Splatting, enhancing dynamic scene reconstruction by disentangling spatial and temporal features.
Findings
Improved spatio-temporal coherence in dynamic scene rendering.
Enhanced reconstruction quality on synthetic and real-world data.
Compatible with existing 3D Gaussian Splatting frameworks.
Abstract
Although dynamic scene reconstruction has long been a fundamental challenge in 3D vision, the recent emergence of 3D Gaussian Splatting (3DGS) offers a promising direction by enabling high-quality, real-time rendering through explicit Gaussian primitives. However, existing 3DGS-based methods for dynamic reconstruction often suffer from \textit{spatio-temporal incoherence} during initialization, where canonical Gaussians are constructed by aggregating observations from multiple frames without temporal distinction. This results in spatio-temporally entangled representations, making it difficult to model dynamic motion accurately. To overcome this limitation, we propose \textbf{STDR} (Spatio-Temporal Decoupling for Real-time rendering), a plug-and-play module that learns spatio-temporal probability distributions for each Gaussian. STDR introduces a spatio-temporal mask, a separated…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper is well-written and the proposed method is clearly explained. - The proposed method is a plug-and-play module that can be easily integrated. When applied to different baseline methods (such as Deformable3D, SCGS, and SPGS), the method shows compatibility and effectiveness, consistently increasing the qualitative performance.
- The novelty of the proposed spatio-temporal probability distribution is somewhat limited. In related works such as 4DGS, the opacity of the Gaussians is also modulated by the temporal dimension, and the standard deviation reflects the persistence of the Gaussian. The proposed method seems to be a discretized version where a long vector whose length is proportional to the number of frames has to be introduced for every single Gaussian. - Rendering efficiency is not reported. By introducing the
1. I think decoupling spatial-temporal modeling makes sense, though I think this lacks novelty given that extensive approaches address this issue [A]. I think it is easy to find massive approaches for decoupling spatial-temporal modeling. 2. The proposed approach is plug-and-play, and can be applied to Deformable3DGS, SC-GS, SPGS on separated benchmarks. [A] SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction
1. My first concern is that this paper only presents a few approaches for plug-and-play evaluation. Meanwhile, those baselines are not representative enough for different lines of work. I think this approach is not appliable for all previous appoaches, could u explain which kinds of paper are suitable for STDR? 2. In 4DGS, some work illustrates the plug-and-play attribute. Could u discuss the differences? [C] 3. It is confusing that this does not apply the proposed technique to Spatial-Temporal
1. Performance Gains: The method improves reconstruction quality across both synthetic and real-world datasets, outperforming reported baselines such as SP-GS and DeformGS. 2. Integration: The module is **plug-and-play** and compatible with multiple existing 3DGS pipelines.
1. Inaccurate or Overgeneralized Claims * The statement *“existing 3DGS-based methods typically adopt a two-stage pipeline”* is not universally true. Several recent dynamic Gaussian methods (e.g., 4DGS,) employ different initialization strategies. * The claim that the proposed masks *“reflect the true dynamics of the scene”* is too strong — if the temporal sampling rate of training data is limited, this convergence cannot guarantee ground-truth dynamic fidelity. * The assertion "fi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
