ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation

Simon Boeder; Fabian Gigengack; Simon Roesler; Holger Caesar; Benjamin Risse

arXiv:2511.15396·cs.CV·November 20, 2025

ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation

Simon Boeder, Fabian Gigengack, Simon Roesler, Holger Caesar, Benjamin Risse

PDF

Open Access

TL;DR

ShelfOcc introduces a vision-only, 3D supervision method for occupancy estimation that surpasses previous LiDAR-dependent approaches by generating consistent semantic voxel labels from video data, improving robustness and accuracy.

Contribution

It presents a novel framework that converts video into native 3D supervision without additional sensors, enabling any occupancy model to learn more effectively in dynamic scenes.

Findings

01

Outperforms previous weakly/shelf-supervised methods by up to 34% on Occ3D-nuScenes.

02

Generates metrically consistent semantic voxel labels from video data.

03

Effectively handles dynamic content and propagates semantic information into stable 3D representations.

Abstract

Recent progress in self- and weakly supervised occupancy estimation has largely relied on 2D projection or rendering-based supervision, which suffers from geometric inconsistencies and severe depth bleeding. We thus introduce ShelfOcc, a vision-only method that overcomes these limitations without relying on LiDAR. ShelfOcc brings supervision into native 3D space by generating metrically consistent semantic voxel labels from video, enabling true 3D supervision without any additional sensors or manual 3D annotations. While recent vision-based 3D geometry foundation models provide a promising source of prior knowledge, they do not work out of the box as a prediction due to sparse or noisy and inconsistent geometry, especially in dynamic driving scenes. Our method introduces a dedicated framework that mitigates these issues by filtering and accumulating static geometry consistently across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization