To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels
Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay, Vasudevan, Xiao Zhang, Dragomir Anguelov

TL;DR
This paper introduces a novel 3D object detection method using range images and graph convolution kernels, achieving high accuracy with significantly reduced computational requirements.
Contribution
It proposes a new 2D convolutional network architecture with flexible kernels for 3D detection from range images, outperforming existing methods in efficiency and accuracy.
Findings
Outperforms state-of-the-art on Waymo dataset
Achieves 75.5% AP for pedestrians
Requires 180 times fewer FLOPS and parameters
Abstract
3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Graph Neural Network · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Layer Normalization · Adam
