InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction
Zhenxing Ming, Julie Stephany Berrio, Mao Shan, and Stewart Worrall

TL;DR
InverseMatrixVT3D introduces a projection matrix-based method for efficient 3D occupancy prediction from multi-view images, avoiding complex depth estimation and transformer queries, and achieves top results in autonomous driving datasets.
Contribution
The paper presents a novel projection matrix approach that simplifies 3D volume construction and enhances efficiency in 3D occupancy prediction.
Findings
Achieves top performance on nuScenes and SemanticKITTI datasets.
Efficiently generates 3D volumes using matrix multiplications with sparse projection matrices.
Outperforms existing methods in detecting vulnerable road users.
Abstract
This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static mapping relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimize GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsGlobal-Local Attention
