DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection
Donghong Jiang,Endian Lin,Hanqing Liu,Mingjie Liu,Luoping Cui,Zhao Yang,Chuang Zhu

TL;DR
This paper introduces DSAA, a dual-stage framework that enhances fine-grained attribute detection in open-vocabulary object detection models by strengthening attribute semantics during inference and training.
Contribution
The paper proposes the DSAA framework with novel modules and loss to improve attribute recognition in open-vocabulary detection models.
Findings
Improved attribute detection accuracy on FG-OVD benchmark.
Enhanced discrimination among instances with different attributes.
Effective across various open-vocabulary models.
Abstract
Open-Vocabulary Object Detection (OVD) models break the limitations of closed-set detection, enabling the iden- tification of unseen categories through natural language prompts. However, they exhibit notable limitations in fine- grained detection tasks involving attributes like color, ma- terial, and texture. We attribute this performance bottle- neck in OVD models to a core issue: when category sig- nals dominate, OVD models tend to marginalize attribute information during inference. This leads to incorrect bind- ing between attributes and target objects. To address this, we propose the Dual-Stage Attribute Activation (DSAA) framework, which enhances fine-grained detection capa- bilities by strengthening attribute semantics at two criti- cal stages. In the text embedding stage, we employ At- tribute Prefix Adapter (APA) module to generate attribute prefixes that inject explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
