Discrepancy-Aware Attention Network for Enhanced Audio-Visual Zero-Shot Learning
RunLin Yu, Yipu Gong, Wenrui Li, Aiwen Sun, Mengren Zheng

TL;DR
This paper introduces a Discrepancy-Aware Attention Network that improves audio-visual zero-shot learning by addressing modality quality and content discrepancies, achieving state-of-the-art results.
Contribution
It proposes novel modules, QDMA and CSGM, to mitigate quality and content discrepancies in audio-visual ZSL, enhancing discriminative capabilities.
Findings
Achieves state-of-the-art performance on benchmark datasets.
Validates effectiveness of proposed modules through ablation studies.
Balances modality contributions for improved zero-shot learning.
Abstract
Audio-visual Zero-Shot Learning (ZSL) has attracted significant attention for its ability to identify unseen classes and perform well in video classification tasks. However, modal imbalance in (G)ZSL leads to over-reliance on the optimal modality, reducing discriminative capabilities for unseen classes. Some studies have attempted to address this issue by modifying parameter gradients, but two challenges still remain: (a) Quality discrepancies, where modalities offer differing quantities and qualities of information for the same concept. (b) Content discrepancies, where sample contributions within a modality vary significantly. To address these challenges, we propose a Discrepancy-Aware Attention Network (DAAN) for Enhanced Audio-Visual ZSL. Our approach introduces a Quality-Discrepancy Mitigation Attention (QDMA) unit to minimize redundant information in the high-quality modality and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Image Processing Techniques
MethodsSoftmax · Attention Is All You Need
