HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy   Prediction

Xiao Zhao; Bo Chen; Mingyang Sun; Dingkang Yang; Youxing Wang; Xukun; Zhang; Mingcheng Li; Dongliang Kou; Xiaoyi Wei; and Lihua Zhang

arXiv:2408.09104·cs.CV·August 20, 2024

HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun, Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, and Lihua Zhang

PDF

TL;DR

HybridOcc introduces a novel hybrid Transformer and NeRF-based approach for 3D scene completion, effectively inferring both visible and occluded scene geometry in autonomous driving scenarios.

Contribution

The paper proposes HybridOcc, combining Transformer and NeRF representations within a coarse-to-fine framework for improved 3D occupancy prediction.

Findings

01

Outperforms existing SSC methods on nuScenes and SemanticKITTI datasets.

02

Effectively infers occluded scene geometry including invisible voxels.

03

Introduces occupancy-aware ray sampling for enhanced scene understanding.

Abstract

Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework. HybridOcc aggregates contextual features through the Transformer paradigm based on hybrid query proposals while combining it with NeRF representation to obtain depth supervision. The Transformer branch contains multiple scales and uses spatial cross-attention for 2D to 3D transformation. The newly designed NeRF branch implicitly infers scene occupancy through volume rendering, including visible and invisible voxels, and explicitly captures scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax