Attention-based Proposals Refinement for 3D Object Detection
Minh-Quan Dao, Elwan H\'ery, Vincent Fr\'emont

TL;DR
This paper introduces APRO3D-Net, a data-driven 3D object detection refinement method using Vector Attention, achieving high accuracy with fewer parameters and real-time speed on benchmark datasets.
Contribution
It proposes a novel Vector Attention-based refinement stage for 3D detection, reducing manual tuning and improving relation modeling between pooled points and ROI.
Findings
Achieves 84.85 AP for Car on KITTI at moderate difficulty.
Attains 47.03 mAP on NuScenes across 10 classes.
Operates at 15 FPS with fewer parameters.
Abstract
Recent advances in 3D object detection are made by developing the refinement stage for voxel-based Region Proposal Networks (RPN) to better strike the balance between accuracy and efficiency. A popular approach among state-of-the-art frameworks is to divide proposals, or Regions of Interest (ROI), into grids and extract features for each grid location before synthesizing them to form ROI features. While achieving impressive performances, such an approach involves several hand-crafted components (e.g. grid sampling, set abstraction) which requires expert knowledge to be tuned correctly. This paper proposes a data-driven approach to ROI feature computing named APRO3D-Net which consists of a voxel-based RPN and a refinement stage made of Vector Attention. Unlike the original multi-head attention, Vector Attention assigns different weights to different channels within a point feature, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Region Proposal Network
