TL;DR
This paper introduces MonoASRH, a novel monocular 3D object detection framework that uses global feature aggregation and scale-aware regression to improve detection accuracy, especially for small and variably scaled objects.
Contribution
The paper proposes MonoASRH with EH-FAM and ASRH modules, enabling global semantic feature extraction and dynamic scale-aware 3D regression, advancing monocular 3D detection methods.
Findings
Achieves state-of-the-art results on KITTI and Waymo datasets.
Effectively detects small-scale and variably scaled objects.
Improves feature representation through global receptive fields.
Abstract
Monocular 3D object detection has attracted great attention due to simplicity and low cost. Existing methods typically follow conventional 2D detection paradigms, first locating object centers and then predicting 3D attributes via neighboring features. However, these methods predominantly rely on progressive cross-scale feature aggregation and focus solely on local information, which may result in a lack of global awareness and the omission of small-scale objects. In addition, due to large variation in object scales across different scenes and depths, inaccurate receptive fields often lead to background noise and degraded feature representation. To address these issues, we introduces MonoASRH, a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH). Specifically, EH-FAM employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Softmax · Focus · Attention Is All You Need · Multi-Head Attention
