Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection

Yifan Wang; Xiaochen Yang; Fanqi Pu; Qingmin Liao; Wenming Yang

arXiv:2411.02747·cs.CV·February 16, 2026

Efficient Feature Aggregation and Scale-Aware Regression for Monocular 3D Object Detection

Yifan Wang, Xiaochen Yang, Fanqi Pu, Qingmin Liao, Wenming Yang

PDF

1 Repo

TL;DR

This paper introduces MonoASRH, a novel monocular 3D object detection framework that uses global feature aggregation and scale-aware regression to improve detection accuracy, especially for small and variably scaled objects.

Contribution

The paper proposes MonoASRH with EH-FAM and ASRH modules, enabling global semantic feature extraction and dynamic scale-aware 3D regression, advancing monocular 3D detection methods.

Findings

01

Achieves state-of-the-art results on KITTI and Waymo datasets.

02

Effectively detects small-scale and variably scaled objects.

03

Improves feature representation through global receptive fields.

Abstract

Monocular 3D object detection has attracted great attention due to simplicity and low cost. Existing methods typically follow conventional 2D detection paradigms, first locating object centers and then predicting 3D attributes via neighboring features. However, these methods predominantly rely on progressive cross-scale feature aggregation and focus solely on local information, which may result in a lack of global awareness and the omission of small-scale objects. In addition, due to large variation in object scales across different scenes and depths, inaccurate receptive fields often lead to background noise and degraded feature representation. To address these issues, we introduces MonoASRH, a novel monocular 3D detection framework composed of Efficient Hybrid Feature Aggregation Module (EH-FAM) and Adaptive Scale-Aware 3D Regression Head (ASRH). Specifically, EH-FAM employs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WYFDUT/MonoASRH
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Softmax · Focus · Attention Is All You Need · Multi-Head Attention