TL;DR
MonoPRIO introduces an adaptive prior conditioning approach for monocular 3D object detection, significantly improving multi-class detection accuracy under challenging conditions.
Contribution
It proposes a novel adaptive prior conditioning method with class-aware size prototypes and regularization, enhancing unified multi-class monocular 3D detection performance.
Findings
Achieves the strongest multi-class results on KITTI test server.
Outperforms existing methods in car-only 3D bounding-box AP.
Provides ablation evidence for the effectiveness of routed injection and CAP.
Abstract
Monocular 3D object detection remains challenging because metric size and depth are underdetermined by single-view evidence, particularly under occlusion, truncation, and projection-induced scale-depth ambiguity. Although recent methods improve depth and geometric reasoning, metric size remains unstable in unified multi-class settings, where class variability and partial visibility broaden plausible size modes. We propose MonoPRIO, a unified monocular 3D detector that targets this bottleneck through adaptive prior conditioning in the size pathway. MonoPRIO constructs class-aware size prototypes offline, routes each decoder query to a soft mixture prior, applies uncertainty-aware log-space conditioning, and uses Cluster-Aligned Prior (CAP) regularisation on matched positives during training. On the official KITTI test server, MonoPRIO achieves the strongest fully reported unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
