Lenna: Language Enhanced Reasoning Detection Assistant

Fei Wei; Xinyu Zhang; Ailing Zhang; Bo Zhang; Xiangxiang Chu

arXiv:2312.02433·cs.CV·December 6, 2023·1 cites

Lenna: Language Enhanced Reasoning Detection Assistant

Fei Wei, Xinyu Zhang, Ailing Zhang, Bo Zhang, Xiangxiang Chu

PDF

Open Access 1 Repo 2 Models

TL;DR

Lenna is a novel multimodal reasoning detection assistant that leverages large language models with an added detection token, demonstrating high performance and low training costs on reasoning-based detection tasks.

Contribution

The paper introduces Lenna, a new approach that enhances reasoning detection in multimodal models by incorporating a dedicated detection token and evaluating it on a new dataset.

Findings

01

Lenna outperforms existing methods on ReasonDet dataset.

02

It requires significantly less training data and costs.

03

Minimal transfer overhead to other tasks.

Abstract

With the fast-paced development of multimodal large language models (MLLMs), we can now converse with AI systems in natural languages to understand images. However, the reasoning power and world knowledge embedded in the large language models have been much less investigated and exploited for image perception tasks. In this paper, we propose Lenna, a language-enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection. Remarkably, Lenna demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meituan-automl/lenna
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling