QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy

Adam Lilja; Ji Lan; Junsheng Fu; Lars Hammarstrand

arXiv:2511.17221·cs.CV·November 24, 2025

QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy

Adam Lilja, Ji Lan, Junsheng Fu, Lars Hammarstrand

PDF

Open Access

TL;DR

QueryOcc introduces a novel query-based self-supervised framework for learning continuous 3D semantic occupancy directly from sensor data, improving spatial precision and scalability for autonomous driving applications.

Contribution

It presents a new 4D query-based approach with a contractive scene representation for efficient, long-range 3D semantic occupancy learning without manual labels.

Findings

01

Surpasses previous camera-based methods by 26% in semantic RayIoU.

02

Operates at 11.6 FPS, enabling real-time applications.

03

Supports supervision from pseudo-point clouds or raw lidar data.

Abstract

Learning 3D scene geometry and semantics from images is a core challenge in computer vision and a key capability for autonomous driving. Since large-scale 3D annotation is prohibitively expensive, recent work explores self-supervised learning directly from sensor data without manual labels. Existing approaches either rely on 2D rendering consistency, where 3D structure emerges only implicitly, or on discretized voxel grids from accumulated lidar point clouds, limiting spatial precision and scalability. We introduce QueryOcc, a query-based self-supervised framework that learns continuous 3D semantic occupancy directly through independent 4D spatio-temporal queries sampled across adjacent frames. The framework supports supervision from either pseudo-point clouds derived from vision foundation models or raw lidar data. To enable long-range supervision and reasoning under constant memory,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization