*: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features
Xusheng Li, Chengliang Wang, Shumao Wang, Zhuo Zeng, and Ji Liu

TL;DR
This paper introduces a novel pillar-based 3D detection scheme with Voxel2Pillar encoding and a multi-scale sparse backbone, significantly improving detection accuracy for autonomous driving applications.
Contribution
It proposes Voxel2Pillar feature encoding and a multi-scale sparse backbone, enhancing feature richness and detection performance without large convolution kernels.
Findings
Improved detection accuracy on Waymo dataset for vehicles, pedestrians, and cyclists.
Effective multi-scale feature extraction without large convolution kernels.
Ablation studies confirm the contribution of each module.
Abstract
The multi-line LiDAR is widely used in autonomous vehicles, so point cloud-based 3D detectors are essential for autonomous driving. Extracting rich multi-scale features is crucial for point cloud-based 3D detectors in autonomous driving due to significant differences in the size of different types of objects. However, because of the real-time requirements, large-size convolution kernels are rarely used to extract large-scale features in the backbone. Current 3D detectors commonly use feature pyramid networks to obtain large-scale features; however, some objects containing fewer point clouds are further lost during down-sampling, resulting in degraded performance. Since pillar-based schemes require much less computation than voxel-based schemes, they are more suitable for constructing real-time 3D detectors. Hence, we propose the *, a pillar-based scheme. We redesigned the feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
MethodsConvNeXt · Convolution
