Generalizing Monocular 3D Object Detection

Abhinav Kumar

arXiv:2508.19593·cs.CV·August 28, 2025

Generalizing Monocular 3D Object Detection

Abhinav Kumar

PDF

TL;DR

This paper introduces novel methods to improve the robustness and generalization of monocular 3D object detection models across diverse scenarios, including occlusions, datasets, and camera parameters, through differentiable NMS, depth-equivariant backbones, and segmentation techniques.

Contribution

It proposes new techniques like GrooMeD-NMS, DEVIANT backbones, and SeaBird segmentation to enhance Mono3D model robustness and generalization beyond existing approaches.

Findings

01

GrooMeD-NMS improves occlusion robustness.

02

DEVIANT backbones enhance dataset generalization.

03

SeaBird reduces noise sensitivity in large object detection.

Abstract

Monocular 3D object detection (Mono3D) is a fundamental computer vision task that estimates an object's class, 3D position, dimensions, and orientation from a single image. Its applications, including autonomous driving, augmented reality, and robotics, critically rely on accurate 3D environmental understanding. This thesis addresses the challenge of generalizing Mono3D models to diverse scenarios, including occlusions, datasets, object sizes, and camera parameters. To enhance occlusion robustness, we propose a mathematically differentiable NMS (GrooMeD-NMS). To improve generalization to new datasets, we explore depth equivariant (DEVIANT) backbones. We address the issue of large object detection, demonstrating that it's not solely a data imbalance or receptive field problem but also a noise sensitivity issue. To mitigate this, we introduce a segmentation-based approach in bird's-eye…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.