DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin

TL;DR
DART introduces a novel framework that enhances open-vocabulary multi-label recognition by combining adaptive intra-class refinement with external knowledge-driven inter-class transfer, achieving state-of-the-art results.
Contribution
DART is the first to explicitly incorporate LLM-derived relational knowledge for adaptive inter-class transfer and perform adaptive intra-class refinement under weak supervision.
Findings
Achieves new state-of-the-art performance on benchmarks.
Effectively localizes objects with weak supervision.
Leverages structured knowledge for improved class reasoning.
Abstract
Open-Vocabulary Multi-Label Recognition (OV-MLR) aims to identify multiple seen and unseen object categories within an image, requiring both precise intra-class localization to pinpoint objects and effective inter-class reasoning to model complex category dependencies. While Vision-Language Pre-training (VLP) models offer a strong open-vocabulary foundation, they often struggle with fine-grained localization under weak supervision and typically fail to explicitly leverage structured relational knowledge beyond basic semantics, limiting performance especially for unseen classes. To overcome these limitations, we propose the Dual Adaptive Refinement Transfer (DART) framework. DART enhances a frozen VLP backbone via two synergistic adaptive modules. For intra-class refinement, an Adaptive Refinement Module (ARM) refines patch features adaptively, coupled with a novel Weakly Supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
