Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Aleksandar Jevti\'c; Christoph Reich; Felix Wimbauer; Oliver Hahn; Christian Rupprecht; Stefan Roth; Daniel Cremers

arXiv:2507.06230·cs.CV·July 28, 2025

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

Aleksandar Jevti\'c, Christoph Reich, Felix Wimbauer, Oliver Hahn, Christian Rupprecht, Stefan Roth, Daniel Cremers

PDF

Open Access 1 Repo 1 Models

TL;DR

SceneDINO introduces an unsupervised method for semantic scene completion from a single image, leveraging self-supervised learning and multi-view consistency to infer 3D geometry and semantics without ground-truth annotations.

Contribution

It presents a novel feed-forward approach that adapts self-supervised techniques for 3D scene understanding, achieving state-of-the-art accuracy without supervision.

Findings

01

State-of-the-art segmentation accuracy in unsupervised SSC

02

Linear probing matches supervised SSC performance

03

Demonstrates strong domain generalization and multi-view consistency

Abstract

Semantic scene completion (SSC) aims to infer both the 3D geometry and semantics of a scene from single images. In contrast to prior work on SSC that heavily relies on expensive ground-truth annotations, we approach SSC in an unsupervised setting. Our novel method, SceneDINO, adapts techniques from self-supervised representation learning and 2D unsupervised scene understanding to SSC. Our training exclusively utilizes multi-view consistency self-supervision without any form of semantic or geometric ground truth. Given a single input image, SceneDINO infers the 3D geometry and expressive 3D DINO features in a feed-forward manner. Through a novel 3D feature distillation approach, we obtain unsupervised 3D semantics. In both 3D and 2D unsupervised scene understanding, SceneDINO reaches state-of-the-art segmentation accuracy. Linear probing our 3D features matches the segmentation accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tum-vision/scenedino
pytorchOfficial

Models

🤗
jev-aleks/SceneDINO
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization

MethodsVision Transformer · self-DIstillation with NO labels