ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

Amirreza Rouhi; Solmaz Arezoomandan; Knut Peterson; Joseph T. Woods; David K. Han

arXiv:2506.08968·cs.CV·June 11, 2025

ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

Amirreza Rouhi, Solmaz Arezoomandan, Knut Peterson, Joseph T. Woods, David K. Han

PDF

Open Access

TL;DR

ADAM is a self-refining, training-free framework that uses large language models and visual embeddings to enable open-world object detection and annotation without prior category labels.

Contribution

It introduces a novel self-refining, category-agnostic approach combining LLMs and visual embeddings for open-world object annotation without retraining.

Findings

01

Effective annotation of novel categories demonstrated on COCO and PASCAL datasets.

02

No fine-tuning or retraining required for new object categories.

03

Improves open-world object detection through self-refinement and contextual reasoning.

Abstract

Object detection models typically rely on predefined categories, limiting their ability to identify novel objects in open-world scenarios. To overcome this constraint, we introduce ADAM: Autonomous Discovery and Annotation Model, a training-free, self-refining framework for open-world object labeling. ADAM leverages large language models (LLMs) to generate candidate labels for unknown objects based on contextual information from known entities within a scene. These labels are paired with visual embeddings from CLIP to construct an Embedding-Label Repository (ELR) that enables inference without category supervision. For a newly encountered unknown object, ADAM retrieves visually similar instances from the ELR and applies frequency-based voting and cross-modal re-ranking to assign a robust label. To further enhance consistency, we introduce a self-refinement loop that re-evaluates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsContrastive Language-Image Pre-training · Adam · Early Learning Regularization