VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Ziyue Zhu; Shenlong Wang; Jin Xie; Jiang-jiang Liu; Jingdong Wang; Jian Yang

arXiv:2506.05563·cs.CV·June 9, 2025

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Ziyue Zhu, Shenlong Wang, Jin Xie, Jiang-jiang Liu, Jingdong Wang, Jian Yang

PDF

Open Access

TL;DR

VoxelSplat introduces a novel regularization framework leveraging 3D Gaussian Splatting to improve semantic occupancy and scene flow prediction by providing enhanced supervision and self-supervised learning, without increasing inference time.

Contribution

The paper proposes VoxelSplat, a new regularization framework that enhances 3D semantic and scene flow prediction using 3D Gaussian Splatting and 2D projection supervision.

Findings

01

Improves accuracy of semantic occupancy prediction.

02

Enhances scene flow estimation performance.

03

Seamlessly integrates into existing models without extra inference cost.

Abstract

Recent advancements in camera-based occupancy prediction have focused on the simultaneous prediction of 3D semantics and scene flow, a task that presents significant challenges due to specific difficulties, e.g., occlusions and unbalanced dynamic environments. In this paper, we analyze these challenges and their underlying causes. To address them, we propose a novel regularization framework called VoxelSplat. This framework leverages recent developments in 3D Gaussian Splatting to enhance model performance in two key ways: (i) Enhanced Semantics Supervision through 2D Projection: During training, our method decodes sparse semantic 3D Gaussians from 3D representations and projects them onto the 2D camera view. This provides additional supervision signals in the camera-visible space, allowing 2D labels to improve the learning of 3D semantics. (ii) Scene Flow Learning: Our framework uses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition