MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

Changwoo Jeon; Rishi Upadhyay; Achuta Kadambi

arXiv:2603.19538·cs.CV·May 19, 2026

MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

Changwoo Jeon, Rishi Upadhyay, Achuta Kadambi

PDF

1 Models

TL;DR

MoCA3D is a novel monocular 3D object detection model that predicts image-plane corners and depths without needing camera intrinsics, enabling better geometric fidelity and downstream application utility.

Contribution

MoCA3D introduces a class-agnostic, inference-time intrinsics-free approach for predicting projected 3D bounding box corners and depths from monocular images.

Findings

01

Achieves state-of-the-art image-plane corner geometric accuracy, improving PAG by 22.8%.

02

Maintains competitive 3D IoU performance with significantly fewer parameters.

03

Enables downstream tasks previously infeasible without known camera intrinsics.

Abstract

Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulates pixel-space localization and depth assignment as dense prediction via corner heatmaps and depth maps. To evaluate image-plane geometric fidelity, we propose Pixel-Aligned Geometry (PAG), which directly measures image-plane corner and depth consistency. Extensive experiments demonstrate that MoCA3D achieves state-of-the-art performance, improving image-plane corner PAG by 22.8% while remaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
jeoncwcw/MoCA3D
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging