
TL;DR
This paper introduces novel methods to improve the robustness and generalization of monocular 3D object detection models across diverse scenarios, including occlusions, datasets, and camera parameters, through differentiable NMS, depth-equivariant backbones, and segmentation techniques.
Contribution
It proposes new techniques like GrooMeD-NMS, DEVIANT backbones, and SeaBird segmentation to enhance Mono3D model robustness and generalization beyond existing approaches.
Findings
GrooMeD-NMS improves occlusion robustness.
DEVIANT backbones enhance dataset generalization.
SeaBird reduces noise sensitivity in large object detection.
Abstract
Monocular 3D object detection (Mono3D) is a fundamental computer vision task that estimates an object's class, 3D position, dimensions, and orientation from a single image. Its applications, including autonomous driving, augmented reality, and robotics, critically rely on accurate 3D environmental understanding. This thesis addresses the challenge of generalizing Mono3D models to diverse scenarios, including occlusions, datasets, object sizes, and camera parameters. To enhance occlusion robustness, we propose a mathematically differentiable NMS (GrooMeD-NMS). To improve generalization to new datasets, we explore depth equivariant (DEVIANT) backbones. We address the issue of large object detection, demonstrating that it's not solely a data imbalance or receptive field problem but also a noise sensitivity issue. To mitigate this, we introduce a segmentation-based approach in bird's-eye…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
