Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation
Seungjun Oh, Younggeun Lee, Hyejin Jeon, Eunbyung Park

TL;DR
This paper introduces a hybrid 3D-4D Gaussian Splatting framework for dynamic scene reconstruction that reduces computational costs by adaptively representing static regions with 3D Gaussians and dynamic regions with 4D Gaussians, maintaining high visual quality.
Contribution
The proposed method adaptively combines 3D and 4D Gaussian representations, significantly improving efficiency over existing 4D Gaussian Splatting techniques.
Findings
Faster training times compared to baseline 4D methods
Maintains or improves visual quality in dynamic scene synthesis
Reduces memory overhead by converting static Gaussians to 3D
Abstract
Recent advancements in dynamic 3D scene reconstruction have shown promising results, enabling high-fidelity 3D novel view synthesis with improved temporal consistency. Among these, 4D Gaussian Splatting (4DGS) has emerged as an appealing approach due to its ability to model high-fidelity spatial and temporal variations. However, existing methods suffer from substantial computational and memory overhead due to the redundant allocation of 4D Gaussians to static regions, which can also degrade image quality. In this work, we introduce hybrid 3D-4D Gaussian Splatting (3D-4DGS), a novel framework that adaptively represents static regions with 3D Gaussians while reserving 4D Gaussians for dynamic elements. Our method begins with a fully 4D Gaussian representation and iteratively converts temporally invariant Gaussians into 3D, significantly reducing the number of parameters and improving…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1.This paper is well-written and easy to follow. 2.The idea of modeling the dynamic and static parts of a scene separately is very intuitive.
1.Lack of Novelty. The representations for modeling dynamic and static objects, 3D/4DGS, are borrowed from existing works, and the conversion between them is simply the result of discarding the temporal dimension. There is no unique insight. 2.Subtle Performance Improvements. The quantitative improvements compared to existing SOTA methods (even the baseline 4DGS) are limited. The differences are also difficult to be seen in the qualitative comparison. 3.Evaluations on Decoupling Static/Dynamic E
The paper provides a clear and important insight: modeling a hybrid scene containing both static and dynamic components should utilize a hybrid representation, such as 3D-4D Gaussian Splatting. This is an underexplored problem that the authors have identified and effectively addressed. Previous approaches, including 4DGS, StreamGS, and STGS, rely on temporal Gaussians to represent the entire scene, overlooking the distinction between static and dynamic regions. This paper highlights this gap and
1. The hyperparameter \(\tau\), which controls the identification of static Gaussians, was carefully tuned. As shown in Table 4, even a slightly higher or lower value results in performance degradation, making the method perform worse than 4D Gaussians. This highlights a lack of robustness in the approach. For scenes with varying characteristics, such as differing ratios of dynamic components or varying motion amplitudes, \(\tau\) would need to be adjusted accordingly. If not properly selected,
1. The paper addresses a practical and significant problem in the 4DGS domain: the redundant representation of static regions, which leads to high computational and memory overhead. 2. Compared to the 4DGS baseline, the proposed method shows improvements in training speed, storage, and reconstruction quality.
1. The core mechanism for distinguishing static from dynamic content relies on the temporal scale $s_t$ and a hard threshold $\tau$, which is fundamentally a heuristic approach. This threshold $\tau$ requires manual tuning for different datasets and sequence lengths (e.g., 3.0 for N3V 10s, 6.0 for 40s, and 1.0 for Technicolor). This suggests the hyperparameter may be highly sensitive to the scene's motion characteristics and lacks generalizability. 2. The improvements appear marginal. As shown
- Adaptive Hybrid Representation: Dynamically classifies Gaussians to balance computational cost and fidelity by modeling static regions in 3D and dynamic regions in full 4D. - Fast Training and Memory Efficiency: Removes redundant temporal parameters for static Gaussians, achieving significantly shorter training times and reduced memory consumption without sacrificing quality. - High-Fidelity Dynamic Modeling: Retains full 4D Gaussians for genuinely dynamic content, enabling accurate captur
- The proposed method tends to train many dynamic 4D Gaussians with very small temporal variation scales, which effectively leads to overparameterization. This structural characteristic allows the model to **"memorize" visual content of the scene rather than truly learning meaningful motion dynamics.** The observed acceleration in training speed can ironically be interpreted as evidence of this overparameterization; instead of efficient spatiotemporal modeling, the network is fitting redundant
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
