Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting
Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang,, Jianmin Ji, Yu Zhang

TL;DR
This paper introduces Neighbor-Vote, a novel method that improves monocular 3D object detection by leveraging neighbor predictions and consensus voting to correct pseudo-LiDAR point cloud inaccuracies.
Contribution
It proposes a neighbor-voting technique that enhances monocular 3D detection accuracy by aggregating local predictions and encoding ROI scores into pseudo-LiDAR points.
Findings
Outperforms state-of-the-art on KITTI benchmark, especially for hard cases.
Significantly improves 3D detection accuracy in bird's eye view.
Effectively reduces position shifts caused by monocular depth estimation errors.
Abstract
As cameras are increasingly deployed in new application domains such as autonomous driving, performing 3D object detection on monocular images becomes an important task for visual scene understanding. Recent advances on monocular 3D object detection mainly rely on the ``pseudo-LiDAR'' generation, which performs monocular depth estimation and lifts the 2D pixels to pseudo 3D points. However, depth estimation from monocular images, due to its poor accuracy, leads to inevitable position shift of pseudo-LiDAR points within the object. Therefore, the predicted bounding boxes may suffer from inaccurate location and deformed shape. In this paper, we present a novel neighbor-voting method that incorporates neighbor predictions to ameliorate object detection from severely deformed pseudo-LiDAR point clouds. Specifically, each feature point around the object forms their own predictions, and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection
