Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

Jiangnan Li; Linqing Huang; Xiaowen Yan; Min Gan; Wenpeng Lu; Jinfu Fan

arXiv:2604.17710·cs.CV·April 21, 2026

Dynamic Visual-semantic Alignment for Zero-shot Learning with Ambiguous Labels

Jiangnan Li, Linqing Huang, Xiaowen Yan, Min Gan, Wenpeng Lu, Jinfu Fan

PDF

TL;DR

This paper introduces DVSA, a robust zero-shot learning framework that effectively handles ambiguous labels through dynamic alignment, contrastive optimization, and label disambiguation, improving recognition of unseen classes.

Contribution

The paper proposes a novel dynamic visual-semantic alignment method with label disambiguation for zero-shot learning under ambiguous supervision, enhancing robustness and accuracy.

Findings

01

DVSA outperforms existing methods on standard benchmarks.

02

The dynamic label disambiguation reduces noise and improves generalization.

03

Contrastive MI optimization enhances attribute discriminability.

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes without visual instances. However, existing methods usually assume clean labels, overlooking real-world label noise and ambiguity, which degrades performance. To bridge this gap, we propose the Dynamic Visual-semantic Alignment (DVSA), a robust ZSL framework for learning from ambiguous labels. DVSA uses a bidirectional visual-semantic alignment module with attention to mutually calibrate visual features and attribute prototypes, and a contrastive optimization grounded in Mutual Information (MI) at the attribute level to strengthen discriminative, semantically consistent attributes. In addition, a dynamic label disambiguation mechanism iteratively corrects noisy supervision while preserving semantic consistency, narrowing the instance-label gap, and improving generalization. Extensive experiments on standard benchmarks verify that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.