FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D   Bird's-Eye View and Perspective View

Jiawei Hou; Xiaoyan Li; Wenhao Guan; Gang Zhang; Di Feng; Yuheng Du,; Xiangyang Xue; Jian Pu

arXiv:2403.02710·cs.CV·March 6, 2024·2 cites

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

Jiawei Hou, Xiaoyan Li, Wenhao Guan, Gang Zhang, Di Feng, Yuheng Du,, Xiangyang Xue, Jian Pu

PDF

Open Access

TL;DR

FastOcc is a novel method that significantly accelerates 3D occupancy prediction in autonomous driving by replacing heavy 3D convolutions with a lightweight 2D BEV approach, maintaining high accuracy and achieving state-of-the-art results.

Contribution

The paper introduces a residual-like architecture that replaces 3D convolutions with lightweight 2D BEV convolutions and feature interpolation, boosting inference speed without sacrificing accuracy.

Findings

01

FastOcc achieves state-of-the-art results on Occ3D-nuScenes.

02

Inference speed is significantly improved by the new architecture.

03

Maintains high accuracy comparable to existing methods.

Abstract

In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Advanced Image and Video Retrieval Techniques

Methods3D Convolution · Convolution