OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for   3D Semantic Occupancy Prediction

Zhenxing Ming; Julie Stephany Berrio; Mao Shan; Yaoqi Huang; Hongyu; Lyu; Nguyen Hoang Khoi Tran; Tzu-Yun Tseng; and Stewart Worrall

arXiv:2505.03284·cs.CV·May 7, 2025

OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Yaoqi Huang, Hongyu, Lyu, Nguyen Hoang Khoi Tran, Tzu-Yun Tseng, and Stewart Worrall

PDF

Open Access

TL;DR

OccCylindrical introduces a novel cylindrical coordinate-based multi-modal fusion approach for 3D semantic occupancy prediction, improving detail preservation and achieving state-of-the-art results in autonomous vehicle perception tasks.

Contribution

The paper proposes a cylindrical coordinate-based fusion method that enhances 3D semantic occupancy prediction by preserving geometric details better than Cartesian-based methods.

Findings

01

Achieves state-of-the-art performance on nuScenes dataset.

02

Effective in challenging rainy and nighttime conditions.

03

Outperforms existing multisensor fusion approaches.

Abstract

The safe operation of autonomous vehicles (AVs) is highly dependent on their understanding of the surroundings. For this, the task of 3D semantic occupancy prediction divides the space around the sensors into voxels, and labels each voxel with both occupancy and semantic information. Recent perception models have used multisensor fusion to perform this task. However, existing multisensor fusion-based approaches focus mainly on using sensor information in the Cartesian coordinate system. This ignores the distribution of the sensor readings, leading to a loss of fine-grained details and performance degradation. In this paper, we propose OccCylindrical that merges and refines the different modality features under cylindrical coordinates. Our method preserves more fine-grained geometry detail that leads to better performance. Extensive experiments conducted on the nuScenes dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques

MethodsFocus