Towards Intrinsic-Aware Monocular 3D Object Detection

Zhihao Zhang; Abhinav Kumar; Xiaoming Liu

arXiv:2603.27059·cs.CV·March 31, 2026

Towards Intrinsic-Aware Monocular 3D Object Detection

Zhihao Zhang, Abhinav Kumar, Xiaoming Liu

PDF

1 Models

TL;DR

MonoIA introduces a semantic, language-grounded approach to monocular 3D object detection, making it robust to camera intrinsic variations and improving performance across multiple benchmarks.

Contribution

The paper presents MonoIA, a novel intrinsic-aware framework that models camera intrinsics as perceptual transformations using language and vision-language models, enhancing cross-camera robustness.

Findings

01

Achieves state-of-the-art results on KITTI, Waymo, and nuScenes benchmarks.

02

Improves KITTI detection performance by +1.18% and multi-dataset training by +4.46%.

03

Demonstrates robustness to intrinsic variations through semantic intrinsic embeddings.

Abstract

Monocular 3D object detection (Mono3D) aims to infer object locations and dimensions in 3D space from a single RGB image. Despite recent progress, existing methods remain highly sensitive to camera intrinsics and struggle to generalize across diverse settings, since intrinsics govern how 3D scenes are projected onto the image plane. We propose MonoIA, a unified intrinsic-aware framework that models and adapts to intrinsic variation through a language-grounded representation. The key insight is that intrinsic variation is not a numeric difference but a perceptual transformation that alters apparent scale, perspective, and spatial geometry. To capture this effect, MonoIA employs large language models and vision-language models to generate intrinsic embeddings that encode the visual and geometric implications of camera parameters. These embeddings are hierarchically integrated into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zhihao406/MonoIA
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.