Unified Audio Event Detection

Yidi Jiang; Ruijie Tao; Wen Huang; Qian Chen; Wen Wang

arXiv:2409.08552·eess.AS·September 16, 2024

Unified Audio Event Detection

Yidi Jiang, Ruijie Tao, Wen Huang, Qian Chen, Wen Wang

PDF

Open Access

TL;DR

This paper introduces Unified Audio Event Detection (UAED), a comprehensive framework that simultaneously detects speech and non-speech sounds, leveraging task synergy with a Transformer-based model to outperform separate models.

Contribution

The paper proposes a novel UAED task and a Transformer-based T-UAED framework that jointly models speech and non-speech sounds, improving over baseline methods.

Findings

01

T-UAED outperforms baseline combining SED and SD outputs.

02

T-UAED performs comparably to specialized models for individual tasks.

03

The framework effectively exploits task interactions.

Abstract

Sound Event Detection (SED) detects regions of sound events, while Speaker Diarization (SD) segments speech conversations attributed to individual speakers. In SED, all speaker segments are classified as a single speech event, while in SD, non-speech sounds are treated merely as background noise. Thus, both tasks provide only partial analysis in complex audio scenarios involving both speech conversation and non-speech sounds. In this paper, we introduce a novel task called Unified Audio Event Detection (UAED) for comprehensive audio analysis. UAED explores the synergy between SED and SD tasks, simultaneously detecting non-speech sound events and fine-grained speech events based on speaker identities. To tackle this task, we propose a Transformer-based UAED (T-UAED) framework and construct the UAED Data derived from the Librispeech dataset and DESED soundbank. Experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis