DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition

Haijing Liu; Tao Pu; Hefeng Wu; Keze Wang; Liang Lin

arXiv:2508.05585·cs.CV·August 8, 2025

DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition

Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin

PDF

TL;DR

DART introduces a novel framework that enhances open-vocabulary multi-label recognition by combining adaptive intra-class refinement with external knowledge-driven inter-class transfer, achieving state-of-the-art results.

Contribution

DART is the first to explicitly incorporate LLM-derived relational knowledge for adaptive inter-class transfer and perform adaptive intra-class refinement under weak supervision.

Findings

01

Achieves new state-of-the-art performance on benchmarks.

02

Effectively localizes objects with weak supervision.

03

Leverages structured knowledge for improved class reasoning.

Abstract

Open-Vocabulary Multi-Label Recognition (OV-MLR) aims to identify multiple seen and unseen object categories within an image, requiring both precise intra-class localization to pinpoint objects and effective inter-class reasoning to model complex category dependencies. While Vision-Language Pre-training (VLP) models offer a strong open-vocabulary foundation, they often struggle with fine-grained localization under weak supervision and typically fail to explicitly leverage structured relational knowledge beyond basic semantics, limiting performance especially for unseen classes. To overcome these limitations, we propose the Dual Adaptive Refinement Transfer (DART) framework. DART enhances a frozen VLP backbone via two synergistic adaptive modules. For intra-class refinement, an Adaptive Refinement Module (ARM) refines patch features adaptively, coupled with a novel Weakly Supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.