Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework
Liuyuan Wen

TL;DR
This paper proposes a unified framework for audio-visual generalized zero-shot learning that integrates embedding and generative methods using out-of-distribution detection to improve classification of seen and unseen classes.
Contribution
It introduces a novel framework combining generative adversarial networks and OOD detection for audio-visual GZSL, addressing limitations of existing methods.
Findings
Significant performance improvements over state-of-the-art methods.
Effective synthesis of unseen features using GANs.
Robust OOD detection for better class differentiation.
Abstract
Generalized Zero-Shot Learning (GZSL) is a challenging task requiring accurate classification of both seen and unseen classes. Within this domain, Audio-visual GZSL emerges as an extremely exciting yet difficult task, given the inclusion of both visual and acoustic features as multi-modal inputs. Existing efforts in this field mostly utilize either embedding-based or generative-based methods. However, generative training is difficult and unstable, while embedding-based methods often encounter domain shift problem. Thus, we find it promising to integrate both methods into a unified framework to leverage their advantages while mitigating their respective disadvantages. Our study introduces a general framework employing out-of-distribution (OOD) detection, aiming to harness the strengths of both approaches. We first employ generative adversarial networks to synthesize unseen features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications
