MonoNext: A 3D Monocular Object Detection with ConvNext
Marcelo Eduardo Pederiva, Jos\'e Mario De Martino, Alessandro, Zimmer

TL;DR
MonoNext introduces a ConvNext-based monocular 3D object detection model that uses a spatial grid and multi-task learning, achieving high accuracy with less computational cost on the KITTI dataset.
Contribution
The paper presents MonoNext, a novel multi-task learning approach utilizing ConvNext for monocular 3D detection with a simple spatial grid, requiring only 3D bounding box annotations.
Findings
MonoNext achieves high precision on KITTI dataset.
Adding more training data improves MonoNext's accuracy.
Performance is comparable to state-of-the-art methods.
Abstract
Autonomous driving perception tasks rely heavily on cameras as the primary sensor for Object Detection, Semantic Segmentation, Instance Segmentation, and Object Tracking. However, RGB images captured by cameras lack depth information, which poses a significant challenge in 3D detection tasks. To supplement this missing data, mapping sensors such as LIDAR and RADAR are used for accurate 3D Object Detection. Despite their significant accuracy, the multi-sensor models are expensive and require a high computational demand. In contrast, Monocular 3D Object Detection models are becoming increasingly popular, offering a faster, cheaper, and easier-to-implement solution for 3D detections. This paper introduces a different Multi-Tasking Learning approach called MonoNext that utilizes a spatial grid to map objects in the scene. MonoNext employs a straightforward approach based on the ConvNext…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
MethodsConvNeXt
