Polyphonic audio event detection: multi-label or multi-class multi-task   classification problem?

Huy Phan; Thi Ngoc Tho Nguyen; Philipp Koch; Alfred Mertins

arXiv:2201.12557·eess.AS·February 1, 2022

Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?

Huy Phan, Thi Ngoc Tho Nguyen, Philipp Koch, Alfred Mertins

PDF

Open Access

TL;DR

This paper proposes a multi-class multi-task approach for polyphonic audio event detection, dividing event categories into groups to better handle overlaps and improve performance over traditional multi-label methods.

Contribution

It introduces a novel multi-class multi-task framework with a specialized network architecture for polyphonic AED, addressing the combinatorial explosion issue.

Findings

01

Outperforms multi-label approaches on synthetic dataset

02

Effective handling of high event overlap scenarios

03

Improved detection accuracy and robustness

Abstract

Polyphonic events are the main error source of audio event detection (AED) systems. In deep-learning context, the most common approach to deal with event overlaps is to treat the AED task as a multi-label classification problem. By doing this, we inherently consider multiple one-vs.-rest classification problems, which are jointly solved by a single (i.e. shared) network. In this work, to better handle polyphonic mixtures, we propose to frame the task as a multi-class classification problem by considering each possible label combination as one class. To circumvent the large number of arising classes due to combinatorial explosion, we divide the event categories into multiple groups and construct a multi-task problem in a divide-and-conquer fashion, where each of the tasks is a multi-class classification problem. A network architecture is then devised for multi-class multi-task modelling.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies