LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang,, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

TL;DR
LaMI-DETR introduces a novel approach leveraging language model instructions to improve open-vocabulary object detection, addressing concept representation and overfitting issues, resulting in state-of-the-art performance without external training resources.
Contribution
The paper proposes LaMI-DETR, a simple DETR-like detector that uses GPT and T5 to enhance concept representation and reduce overfitting in open-vocabulary detection.
Findings
Achieves 43.4 box AP on OV-LVIS, surpassing previous best by 7.8.
Effectively leverages language models to improve detection accuracy.
No external training resources required.
Abstract
Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors.To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR.LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories.These inter-category relationships refine concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Linear Layer · Weight Decay · SentencePiece · Softmax · Multi-Head Attention
