A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision

Chensheng Peng; Ido Sobol; Masayoshi Tomizuka; Kurt Keutzer; Chenfeng Xu; Or Litany

arXiv:2412.00623·cs.CV·July 29, 2025

A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision

Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany

PDF

Open Access

TL;DR

This paper introduces a teacher-guided diffusion framework for 3D Gaussian splats generation using only 2D supervision, effectively capturing diverse 3D structures without requiring full 3D ground truth data.

Contribution

It proposes a novel decoupling training approach that leverages deterministic 3D predictions as teachers to enable 3D diffusion models trained solely with 2D supervision.

Findings

01

Improves 3D generative quality over deterministic teachers

02

Works effectively on object-level and scene-level datasets

03

Demonstrates scalable 3D modeling without full 3D supervision

Abstract

We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision. Recovering 3D structure from 2D images is inherently ill-posed due to the ambiguity of possible reconstructions, making generative models a natural choice. However, most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets. To address this, we propose leveraging sparse-view supervision as a scalable alternative. While recent reconstruction models use sparse-view supervision with differentiable rendering to lift 2D images to 3D, they are predominantly deterministic, failing to capture the diverse set of plausible solutions and producing blurry predictions in uncertain regions. A key challenge in training 3D diffusion models with 2D supervision is that the standard training paradigm requires both the denoising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications

MethodsDiffusion