InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D   Occupancy Prediction

Zhenxing Ming; Julie Stephany Berrio; Mao Shan; and Stewart Worrall

arXiv:2401.12422·cs.CV·April 30, 2024·1 cites

InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, and Stewart Worrall

PDF

Open Access 1 Repo

TL;DR

InverseMatrixVT3D introduces a projection matrix-based method for efficient 3D occupancy prediction from multi-view images, avoiding complex depth estimation and transformer queries, and achieves top results in autonomous driving datasets.

Contribution

The paper presents a novel projection matrix approach that simplifies 3D volume construction and enhances efficiency in 3D occupancy prediction.

Findings

01

Achieves top performance on nuScenes and SemanticKITTI datasets.

02

Efficiently generates 3D volumes using matrix multiplications with sparse projection matrices.

03

Outperforms existing methods in detecting vulnerable road users.

Abstract

This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static mapping relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimize GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielming123/inversematrixvt3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsGlobal-Local Attention