Optimized CNNs for Rapid 3D Point Cloud Object Recognition
Tianyi Lyu, Dian Gu, Peiyuan Chen, Yaoting Jiang, Zhenhong Zhang,, Huadong Pang, Li Zhou, Yiping Dong

TL;DR
This paper presents a novel sparse CNN architecture with an $ abla$1 regularization technique for efficient 3D object detection in point clouds, achieving superior accuracy and speed on the MVTec 3D-AD benchmark.
Contribution
It introduces a sparse convolutional layer design with $ abla$1 regularization, improving 3D point cloud object recognition efficiency and accuracy over prior methods.
Findings
Outperforms previous state-of-the-art in 3D object detection
Maintains competitive processing speeds for real-time use
Demonstrates effectiveness on the MVTec 3D-AD benchmark
Abstract
This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data. We explore the trade-off between accuracy and speed across diverse network architectures and advocate for integrating an penalty on filter activations to augment sparsity within intermediate layers. This research pioneers the proposal of sparse convolutional layers combined with regularization to effectively handle large-scale 3D data processing. Our method's efficacy is demonstrated on the MVTec 3D-AD object detection benchmark. The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature-Centric Voting
