LLM-Guided Agentic Object Detection for Open-World Understanding
Furkan Mumcu, Michael J. Jones, Anoop Cherian, Yasin Yilmaz

TL;DR
This paper introduces LAOD, a novel framework that leverages large language models to enable zero-shot, label-free object detection and naming in open-world scenarios, enhancing autonomy and adaptability.
Contribution
We propose LAOD, a framework that uses LLMs to generate scene-specific object names for open-vocabulary detection without prior labels, advancing open-world understanding.
Findings
Effective detection and naming of novel objects demonstrated on LVIS, COCO, and COCO-OOD datasets.
Introduction of CAAP and SNAP metrics for evaluating localization and naming performance.
Strong performance in zero-shot detection and naming tasks, surpassing existing methods.
Abstract
Object detection traditionally relies on fixed category sets, requiring costly re-training to handle novel objects. While Open-World and Open-Vocabulary Object Detection (OWOD and OVOD) improve flexibility, OWOD lacks semantic labels for unknowns, and OVOD depends on user prompts, limiting autonomy. We propose an LLM-guided agentic object detection (LAOD) framework that enables fully label-free, zero-shot detection by prompting a Large Language Model (LLM) to generate scene-specific object names. These are passed to an open-vocabulary detector for localization, allowing the system to adapt its goals dynamically. We introduce two new metrics, Class-Agnostic Average Precision (CAAP) and Semantic Naming Average Precision (SNAP), to separately evaluate localization and naming. Experiments on LVIS, COCO, and COCO-OOD validate our approach, showing strong performance in detecting and naming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Robotic Path Planning Algorithms · Advanced Neural Network Applications
