HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection
Di Wu, Feng Yang, Benlian Xu, Pan Liao, Wenhui Zhao, Dingwen Zhang

TL;DR
HV-BEV introduces a novel decoupled feature sampling method for multi-view 3D object detection that separately handles horizontal aggregation and vertical height-aware sampling, improving detection accuracy especially for objects at different heights.
Contribution
The paper proposes HV-BEV, a new approach that decouples horizontal and vertical feature sampling in BEV-based 3D detection, enhancing object association and height awareness.
Findings
Achieves 50.5% mAP and 59.8% NDS on nuScenes test set.
Outperforms baseline methods in multi-view 3D detection tasks.
Effectively models height variation and object association across views.
Abstract
The application of vision-based multi-view environmental perception system has been increasingly recognized in autonomous driving technology, especially the BEV-based models. Current state-of-the-art solutions primarily encode image features from each camera view into the BEV space through explicit or implicit depth prediction. However, these methods often overlook the structured correlations among different parts of objects in 3D space and the fact that different categories of objects often occupy distinct local height ranges. For example, trucks appear at higher elevations, whereas traffic cones are near the ground. In this work, we propose a novel approach that decouples feature sampling in the \textbf{BEV} grid queries paradigm into \textbf{H}orizontal feature aggregation and \textbf{V}ertical adaptive height-aware reference point sampling (HV-BEV), aiming to improve both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training · Focus
