GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

Jiaming Li; Zhijia Liang; Weikai Chen; Lin Ma; Guanbin Li

arXiv:2603.27014·cs.CV·March 31, 2026

GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

Jiaming Li, Zhijia Liang, Weikai Chen, Lin Ma, Guanbin Li

PDF

1 Repo 1 Video

TL;DR

GUIDED introduces a modular framework for fine-grained open-vocabulary object detection, disentangling subject localization from attribute recognition to improve accuracy and robustness.

Contribution

It proposes a novel decomposition approach that separates localization and recognition, with attribute fusion and discrimination modules, achieving state-of-the-art results.

Findings

01

Achieves new state-of-the-art on FG-OVD and 3F-OVD benchmarks.

02

Effectively disentangles subject localization from attribute recognition.

03

Improves detection accuracy by mitigating attribute over-representation.

Abstract

Fine-grained open-vocabulary object detection (FG-OVD) aims to detect novel object categories described by attribute-rich texts. While existing open-vocabulary detectors show promise at the base-category level, they underperform in fine-grained settings due to the semantic entanglement of subjects and attributes in pretrained vision-language model (VLM) embeddings -- leading to over-representation of attributes, mislocalization, and semantic drift in embedding space. We propose GUIDED, a decomposition framework specifically designed to address the semantic entanglement between subjects and attributes in fine-grained prompts. By separating object localization and fine-grained recognition into distinct pathways, HUIDED aligns each subtask with the module best suited for its respective roles. Specifically, given a fine-grained class name, we first use a language model to extract a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijm48/GUIDED
github

Videos

GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection· slideslive