Multi-label Zero-Shot Audio Classification with Temporal Attention

Duygu Dogan; Huang Xie; Toni Heittola; Tuomas Virtanen

arXiv:2409.00408·cs.SD·September 4, 2024

Multi-label Zero-Shot Audio Classification with Temporal Attention

Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces a novel multi-label zero-shot audio classification method using temporal attention to focus on relevant audio segments, improving accuracy over previous aggregated feature approaches.

Contribution

The study presents a new approach that applies temporal attention to enhance multi-label zero-shot audio classification, addressing the challenge of classifying multiple unseen sound classes.

Findings

01

Temporal attention improves classification accuracy.

02

Method outperforms baseline models on AudioSet subset.

03

Enhances zero-shot learning in multi-label audio tasks.

Abstract

Zero-shot learning models are capable of classifying new classes by transferring knowledge from the seen classes using auxiliary information. While most of the existing zero-shot learning methods focused on single-label classification tasks, the present study introduces a method to perform multi-label zero-shot audio classification. To address the challenge of classifying multi-label sounds while generalizing to unseen classes, we adapt temporal attention. The temporal attention mechanism assigns importance weights to different audio segments based on their acoustic and semantic compatibility, thus enabling the model to capture the varying dominance of different sound classes within an audio sample by focusing on the segments most relevant for each class. This leads to more accurate multi-label zero-shot classification than methods employing temporally aggregated acoustic features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need