Joint Analysis of Sound Events and Acoustic Scenes Using Multitask   Learning

Noriyuki Tonami; Keisuke Imoto; Ryosuke Yamanishi; Yoichi; Yamashita

arXiv:2010.09213·cs.SD·February 24, 2021

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

Noriyuki Tonami, Keisuke Imoto, Ryosuke Yamanishi, Yoichi, Yamashita

PDF

TL;DR

This paper introduces a multitask learning approach that jointly analyzes sound events and acoustic scenes, leveraging their relationship to improve detection accuracy in environmental sound analysis.

Contribution

It proposes a novel multitask neural network model that shares information between sound event detection and scene classification tasks, enhancing performance over separate models.

Findings

01

Improved F-score for SED by 1.31 percentage points.

02

Enhanced F-score for ASC by 1.80 percentage points.

03

Effective joint analysis leveraging mutual information between sound events and scenes.

Abstract

Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene "office," the sound events "mouse clicking" and "keyboard typing" are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.