DVLA-RL: Dual-Level Vision-Language Alignment with Reinforcement Learning Gating for Few-Shot Learning

Wenhao Li; Xianjing Meng; Qiangchang Wang; Zhongyi Han; Zhibin Wu; Yilong Yin

arXiv:2602.00795·cs.CV·February 25, 2026

DVLA-RL: Dual-Level Vision-Language Alignment with Reinforcement Learning Gating for Few-Shot Learning

Wenhao Li, Xianjing Meng, Qiangchang Wang, Zhongyi Han, Zhibin Wu, Yilong Yin

PDF

Open Access

TL;DR

DVLA-RL introduces a dual-level vision-language alignment framework with reinforcement learning gating, enhancing few-shot learning by progressively integrating low-level attributes and high-level semantics for better cross-modal understanding.

Contribution

The paper proposes a novel dual-level semantic construction and RL-gated attention mechanism for adaptive vision-language alignment in few-shot learning.

Findings

01

Achieves state-of-the-art results on nine benchmarks.

02

Effectively combines low-level attributes with high-level semantics.

03

Dynamically adjusts cross-modal contributions for improved discrimination.

Abstract

Few-shot learning (FSL) aims to generalize to novel categories with only a few samples. Recent approaches incorporate large language models (LLMs) to enrich visual representations with semantic embeddings derived from class names. However, they overlook progressive and adaptive alignment between vision and language from low-level to high-level semantics, resulting in limited semantic gains. To address these challenges, we propose Dual-level Vision-Language Alignment with Reinforcement Learning gating (DVLA-RL), which consists of Dual-level Semantic Construction (DSC) and RL-gated Attention (RLA). Specifically, DSC conditions LLMs on both class names and support samples to generate discriminative attributes, progressively selects the most relevant ones, and then synthesizes them into coherent class descriptions. This process provides complementary low-level attributes and high-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications