OccTransformer: Improving BEVFormer for 3D camera-only occupancy   prediction

Jian Liu; Sipeng Zhang; Chuixin Kong; Wenyuan Zhang; Yuhang Wu; Yikang; Ding; Borun Xu; Ruibo Ming; Donglai Wei; Xianming Liu

arXiv:2402.18140·cs.CV·February 29, 2024·3 cites

OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction

Jian Liu, Sipeng Zhang, Chuixin Kong, Wenyuan Zhang, Yuhang Wu, Yikang, Ding, Borun Xu, Ruibo Ming, Donglai Wei, Xianming Liu

PDF

Open Access

TL;DR

This paper introduces occTransformer, an improved 3D occupancy prediction method for autonomous driving that enhances BEVFormer with data augmentation, advanced feature extraction, and ensemble techniques, achieving high accuracy in CVPR 2023.

Contribution

We developed occTransformer, integrating multiple techniques like data augmentation, a 3D unet head, and ensemble methods to significantly improve 3D occupancy prediction performance.

Findings

01

Achieved 49.23 miou on the 3D occupancy prediction track.

02

Enhanced model generalization and spatial understanding.

03

Demonstrated effectiveness of ensemble and detection integration.

Abstract

This technical report presents our solution, "occTransformer" for the 3D occupancy prediction track in the autonomous driving challenge at CVPR 2023. Our method builds upon the strong baseline BEVFormer and improves its performance through several simple yet effective techniques. Firstly, we employed data augmentation to increase the diversity of the training data and improve the model's generalization ability. Secondly, we used a strong image backbone to extract more informative features from the input data. Thirdly, we incorporated a 3D unet head to better capture the spatial information of the scene. Fourthly, we added more loss functions to better optimize the model. Additionally, we used an ensemble approach with the occ model BevDet and SurroundOcc to further improve the performance. Most importantly, we integrated 3D detection model StreamPETR to enhance the model's ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image and Video Quality Assessment · Video Surveillance and Tracking Methods