Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation

Seungjun Oh; Younggeun Lee; Hyejin Jeon; Eunbyung Park

arXiv:2505.13215·cs.CV·May 20, 2025

Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation

Seungjun Oh, Younggeun Lee, Hyejin Jeon, Eunbyung Park

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces a hybrid 3D-4D Gaussian Splatting framework for dynamic scene reconstruction that reduces computational costs by adaptively representing static regions with 3D Gaussians and dynamic regions with 4D Gaussians, maintaining high visual quality.

Contribution

The proposed method adaptively combines 3D and 4D Gaussian representations, significantly improving efficiency over existing 4D Gaussian Splatting techniques.

Findings

01

Faster training times compared to baseline 4D methods

02

Maintains or improves visual quality in dynamic scene synthesis

03

Reduces memory overhead by converting static Gaussians to 3D

Abstract

Recent advancements in dynamic 3D scene reconstruction have shown promising results, enabling high-fidelity 3D novel view synthesis with improved temporal consistency. Among these, 4D Gaussian Splatting (4DGS) has emerged as an appealing approach due to its ability to model high-fidelity spatial and temporal variations. However, existing methods suffer from substantial computational and memory overhead due to the redundant allocation of 4D Gaussians to static regions, which can also degrade image quality. In this work, we introduce hybrid 3D-4D Gaussian Splatting (3D-4DGS), a novel framework that adaptively represents static regions with 3D Gaussians while reserving 4D Gaussians for dynamic elements. Our method begins with a fully 4D Gaussian representation and iteratively converts temporally invariant Gaussians into 3D, significantly reducing the number of parameters and improving…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1.This paper is well-written and easy to follow. 2.The idea of modeling the dynamic and static parts of a scene separately is very intuitive.

Weaknesses

1.Lack of Novelty. The representations for modeling dynamic and static objects, 3D/4DGS, are borrowed from existing works, and the conversion between them is simply the result of discarding the temporal dimension. There is no unique insight. 2.Subtle Performance Improvements. The quantitative improvements compared to existing SOTA methods (even the baseline 4DGS) are limited. The differences are also difficult to be seen in the qualitative comparison. 3.Evaluations on Decoupling Static/Dynamic E

Reviewer 02Rating 4Confidence 5

Strengths

The paper provides a clear and important insight: modeling a hybrid scene containing both static and dynamic components should utilize a hybrid representation, such as 3D-4D Gaussian Splatting. This is an underexplored problem that the authors have identified and effectively addressed. Previous approaches, including 4DGS, StreamGS, and STGS, rely on temporal Gaussians to represent the entire scene, overlooking the distinction between static and dynamic regions. This paper highlights this gap and

Weaknesses

1. The hyperparameter $\tau$, which controls the identification of static Gaussians, was carefully tuned. As shown in Table 4, even a slightly higher or lower value results in performance degradation, making the method perform worse than 4D Gaussians. This highlights a lack of robustness in the approach. For scenes with varying characteristics, such as differing ratios of dynamic components or varying motion amplitudes, $\tau$ would need to be adjusted accordingly. If not properly selected,

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper addresses a practical and significant problem in the 4DGS domain: the redundant representation of static regions, which leads to high computational and memory overhead. 2. Compared to the 4DGS baseline, the proposed method shows improvements in training speed, storage, and reconstruction quality.

Weaknesses

1. The core mechanism for distinguishing static from dynamic content relies on the temporal scale $s_t$ and a hard threshold $\tau$, which is fundamentally a heuristic approach. This threshold $\tau$ requires manual tuning for different datasets and sequence lengths (e.g., 3.0 for N3V 10s, 6.0 for 40s, and 1.0 for Technicolor). This suggests the hyperparameter may be highly sensitive to the scene's motion characteristics and lacks generalizability. 2. The improvements appear marginal. As shown

Reviewer 04Rating 2Confidence 5

Strengths

- Adaptive Hybrid Representation: Dynamically classifies Gaussians to balance computational cost and fidelity by modeling static regions in 3D and dynamic regions in full 4D. - Fast Training and Memory Efficiency: Removes redundant temporal parameters for static Gaussians, achieving significantly shorter training times and reduced memory consumption without sacrificing quality. - High-Fidelity Dynamic Modeling: Retains full 4D Gaussians for genuinely dynamic content, enabling accurate captur

Weaknesses

- The proposed method tends to train many dynamic 4D Gaussians with very small temporal variation scales, which effectively leads to overparameterization. This structural characteristic allows the model to **"memorize" visual content of the scene rather than truly learning meaningful motion dynamics.** The observed acceleration in training speed can ironically be interpreted as evidence of this overparameterization; instead of efficient spatiotemporal modeling, the network is fitting redundant

Code & Models

Repositories

ohsngjun/3D-4DGS
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis