HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot   Classification with Unimodal Cues

Ankit Jha; Debabrata Pal; Mainak Singha; Naman Agarwal; Biplab; Banerjee

arXiv:2309.13470·cs.CV·September 26, 2023

HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab, Banerjee

PDF

Open Access

TL;DR

HAVE-Net is a novel framework that generates hallucinated audio-visual embeddings to improve few-shot classification in remote sensing, especially when one modality is missing during testing.

Contribution

This work introduces a new generative approach for meta-training cross-modal features from limited unimodal data in the RS domain.

Findings

01

Outperforms real multimodal classifiers by 0.8-2% on benchmark datasets.

02

Effective in scenarios with missing modalities during inference.

03

Demonstrates robustness in remote sensing classification tasks.

Abstract

Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Image Processing Techniques and Applications

MethodsBalanced Selection