LMM-Det: Make Large Multimodal Models Excel in Object Detection

Jincheng Li; Chunyu Xie; Ji Ao; Dawei Leng; Yuhui Yin

arXiv:2507.18300·cs.CV·July 25, 2025

LMM-Det: Make Large Multimodal Models Excel in Object Detection

Jincheng Li, Chunyu Xie, Ji Ao, Dawei Leng, Yuhui Yin

PDF

Open Access 1 Models

TL;DR

LMM-Det demonstrates that large multimodal models can be effectively adapted for object detection tasks without specialized detection modules, by analyzing and optimizing their capabilities through data and inference adjustments.

Contribution

Proposes a simple approach to enable large multimodal models to perform object detection without additional detection-specific components.

Findings

01

Significant recall degradation in LMMs for object detection compared to specialist detectors.

02

Data distribution adjustment and inference optimization improve detection recall.

03

Extensive experiments validate the effectiveness of LMM-Det.

Abstract

Large multimodal models (LMMs) have garnered wide-spread attention and interest within the artificial intelligence research and industrial communities, owing to their remarkable capability in multimodal understanding, reasoning, and in-context learning, among others. While LMMs have demonstrated promising results in tackling multimodal tasks like image captioning, visual question answering, and visual grounding, the object detection capabilities of LMMs exhibit a significant gap compared to specialist detectors. To bridge the gap, we depart from the conventional methods of integrating heavy detectors with LMMs and propose LMM-Det, a simple yet effective approach that leverages a Large Multimodal Model for vanilla object Detection without relying on specialized detection modules. Specifically, we conduct a comprehensive exploratory analysis when a large multimodal model meets with object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
qihoo360/LMM-Det
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification