TL;DR
MoCA3D is a novel monocular 3D object detection model that predicts image-plane corners and depths without needing camera intrinsics, enabling better geometric fidelity and downstream application utility.
Contribution
MoCA3D introduces a class-agnostic, inference-time intrinsics-free approach for predicting projected 3D bounding box corners and depths from monocular images.
Findings
Achieves state-of-the-art image-plane corner geometric accuracy, improving PAG by 22.8%.
Maintains competitive 3D IoU performance with significantly fewer parameters.
Enables downstream tasks previously infeasible without known camera intrinsics.
Abstract
Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulates pixel-space localization and depth assignment as dense prediction via corner heatmaps and depth maps. To evaluate image-plane geometric fidelity, we propose Pixel-Aligned Geometry (PAG), which directly measures image-plane corner and depth consistency. Extensive experiments demonstrate that MoCA3D achieves state-of-the-art performance, improving image-plane corner PAG by 22.8% while remaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
