*: Improving the 3D detector by introducing Voxel2Pillar feature   encoding and extracting multi-scale features

Xusheng Li; Chengliang Wang; Shumao Wang; Zhuo Zeng; and Ji Liu

arXiv:2405.09828·cs.CV·November 14, 2024

*: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features

Xusheng Li, Chengliang Wang, Shumao Wang, Zhuo Zeng, and Ji Liu

PDF

Open Access

TL;DR

This paper introduces a novel pillar-based 3D detection scheme with Voxel2Pillar encoding and a multi-scale sparse backbone, significantly improving detection accuracy for autonomous driving applications.

Contribution

It proposes Voxel2Pillar feature encoding and a multi-scale sparse backbone, enhancing feature richness and detection performance without large convolution kernels.

Findings

01

Improved detection accuracy on Waymo dataset for vehicles, pedestrians, and cyclists.

02

Effective multi-scale feature extraction without large convolution kernels.

03

Ablation studies confirm the contribution of each module.

Abstract

The multi-line LiDAR is widely used in autonomous vehicles, so point cloud-based 3D detectors are essential for autonomous driving. Extracting rich multi-scale features is crucial for point cloud-based 3D detectors in autonomous driving due to significant differences in the size of different types of objects. However, because of the real-time requirements, large-size convolution kernels are rarely used to extract large-scale features in the backbone. Current 3D detectors commonly use feature pyramid networks to obtain large-scale features; however, some objects containing fewer point clouds are further lost during down-sampling, resulting in degraded performance. Since pillar-based schemes require much less computation than voxel-based schemes, they are more suitable for constructing real-time 3D detectors. Hence, we propose the *, a pillar-based scheme. We redesigned the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsConvNeXt · Convolution