Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation

Minghao Yin; Yukang Cao; Songyou Peng; Kai Han

arXiv:2508.07557·cs.CV·August 12, 2025

Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation

Minghao Yin, Yukang Cao, Songyou Peng, Kai Han

PDF

Open Access

TL;DR

Splat4D is a novel framework that generates high-quality, temporally and spatially consistent 4D content from monocular videos, utilizing diffusion models and multi-view rendering for applications like digital humans and AR/VR.

Contribution

It introduces Splat4D, combining diffusion models, multi-view rendering, and refinement techniques to achieve state-of-the-art 4D content generation from monocular videos.

Findings

01

Achieves superior performance on public benchmarks.

02

Demonstrates versatility in text/image conditioned 4D generation.

03

Produces coherent 4D content for various applications.

Abstract

Generating high-quality 4D content from monocular videos for applications such as digital humans and AR/VR poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance effectively. To overcome these challenges, we introduce Splat4D, a novel framework enabling high-fidelity 4D content generation from a monocular video. Splat4D achieves superior performance while maintaining faithful spatial-temporal coherence by leveraging multi-view rendering, inconsistency identification, a video diffusion model, and an asymmetric U-Net for refinement. Through extensive evaluations on public benchmarks, Splat4D consistently demonstrates state-of-the-art performance across various metrics, underscoring the efficacy of our approach. Additionally, the versatility of Splat4D is validated in various applications such as text/image conditioned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Video Analysis and Summarization