ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch   Encoder-Prototype Query Decoder

Jungho Kim; Changwon Kang; Dongyoung Lee; Sehwan Choi; Jun Won Choi

arXiv:2412.08774·cs.CV·February 28, 2025

ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder

Jungho Kim, Changwon Kang, Dongyoung Lee, Sehwan Choi, Jun Won Choi

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

ProtoOcc is a novel 3D occupancy prediction model that combines dual-branch encoding and prototype-based decoding to achieve high accuracy and efficiency in scene understanding tasks.

Contribution

It introduces a dual-branch encoder and prototype query decoder with scene-adaptive prototypes, enabling single-step 3D occupancy prediction without iterative decoding.

Findings

01

Achieves 45.02% mIoU on Occ3D-nuScenes benchmark.

02

Reaches 39.56% mIoU with 12.83 FPS inference speed.

03

Outperforms previous state-of-the-art methods in accuracy and efficiency.

Abstract

In this paper, we introduce ProtoOcc, a novel 3D occupancy prediction model designed to predict the occupancy states and semantic classes of 3D voxels through a deep semantic understanding of scenes. ProtoOcc consists of two main components: the Dual Branch Encoder (DBE) and the Prototype Query Decoder (PQD). The DBE produces a new 3D voxel representation by combining 3D voxel and BEV representations across multiple scales through a dual branch structure. This design enhances both performance and computational efficiency by providing a large receptive field for the BEV representation while maintaining a smaller receptive field for the voxel representation. The PQD introduces Prototype Queries to accelerate the decoding process. Scene-Adaptive Prototypes are derived from the 3D voxel features of input sample, while Scene-Agnostic Prototypes are computed by applying Scene-Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spa-junghokim/protoocc
pytorchOfficial

Models

🤗
junghokim/ProtoOcc
model

Videos

ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder· underline

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing