SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction

Suzeyu Chen; Leheng Li; Ying-Cong Chen

arXiv:2602.04240·cs.CV·February 5, 2026

SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction

Suzeyu Chen, Leheng Li, Ying-Cong Chen

PDF

Open Access

TL;DR

SPOT-Occ introduces a prototype-guided sparse transformer decoder for efficient, accurate 3D occupancy prediction from camera data, enhancing autonomous vehicle safety with real-time performance.

Contribution

The paper proposes a novel prototype-based sparse transformer decoder with a guided feature selection mechanism and denoising paradigm, improving efficiency and accuracy in 3D occupancy prediction.

Findings

01

Outperforms previous methods in speed and accuracy

02

Uses a two-stage prototype-guided feature aggregation

03

Leverages ground-truth masks for stable query-prototype association

Abstract

Achieving highly accurate and real-time 3D occupancy prediction from cameras is a critical requirement for the safe and practical deployment of autonomous vehicles. While this shift to sparse 3D representations solves the encoding bottleneck, it creates a new challenge for the decoder: how to efficiently aggregate information from a sparse, non-uniformly distributed set of voxel features without resorting to computationally prohibitive dense attention. In this paper, we propose a novel Prototype-based Sparse Transformer Decoder that replaces this costly interaction with an efficient, two-stage process of guided feature selection and focused aggregation. Our core idea is to make the decoder's attention prototype-guided. We achieve this through a sparse prototype selection mechanism, where each query adaptively identifies a compact set of the most salient voxel features, termed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · 3D Shape Modeling and Analysis