ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder
Jungho Kim, Changwon Kang, Dongyoung Lee, Sehwan Choi, Jun Won Choi

TL;DR
ProtoOcc is a novel 3D occupancy prediction model that combines dual-branch encoding and prototype-based decoding to achieve high accuracy and efficiency in scene understanding tasks.
Contribution
It introduces a dual-branch encoder and prototype query decoder with scene-adaptive prototypes, enabling single-step 3D occupancy prediction without iterative decoding.
Findings
Achieves 45.02% mIoU on Occ3D-nuScenes benchmark.
Reaches 39.56% mIoU with 12.83 FPS inference speed.
Outperforms previous state-of-the-art methods in accuracy and efficiency.
Abstract
In this paper, we introduce ProtoOcc, a novel 3D occupancy prediction model designed to predict the occupancy states and semantic classes of 3D voxels through a deep semantic understanding of scenes. ProtoOcc consists of two main components: the Dual Branch Encoder (DBE) and the Prototype Query Decoder (PQD). The DBE produces a new 3D voxel representation by combining 3D voxel and BEV representations across multiple scales through a dual branch structure. This design enhances both performance and computational efficiency by providing a large receptive field for the BEV representation while maintaining a smaller receptive field for the voxel representation. The PQD introduces Prototype Queries to accelerate the decoding process. Scene-Adaptive Prototypes are derived from the 3D voxel features of input sample, while Scene-Agnostic Prototypes are computed by applying Scene-Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing
