Exploring Differences between Human Perception and Model Inference in   Audio Event Recognition

Yizhou Tan; Yanru Wu; Yuanbo Hou; Xin Xu; Hui Bu; Shengchen Li; Dick; Botteldooren; Mark D. Plumbley

arXiv:2409.06580·eess.AS·September 12, 2024

Exploring Differences between Human Perception and Model Inference in Audio Event Recognition

Yizhou Tan, Yanru Wu, Yuanbo Hou, Xin Xu, Hui Bu, Shengchen Li, Dick, Botteldooren, Mark D. Plumbley

PDF

Open Access 1 Repo

TL;DR

This paper investigates the discrepancies between human auditory perception and model inference in audio event recognition, introducing a new dataset and analysis to understand how models differ from human perception in identifying and detecting audio events.

Contribution

The paper presents the MAFAR dataset with multi-annotator labels, and analyzes the differences between human perception and model inference in semantic importance and event detection.

Findings

01

Humans ignore subtle or trivial events in semantic identification.

02

Models are affected by noisy events and tend to be more sensitive in event detection.

03

Significant gap exists between human perception and model inference in AER.

Abstract

Audio Event Recognition (AER) traditionally focuses on detecting and identifying audio events. Most existing AER models tend to detect all potential events without considering their varying significance across different contexts. This makes the AER results detected by existing models often have a large discrepancy with human auditory perception. Although this is a critical and significant issue, it has not been extensively studied by the Detection and Classification of Sound Scenes and Events (DCASE) community because solving it is time-consuming and labour-intensive. To address this issue, this paper introduces the concept of semantic importance in AER, focusing on exploring the differences between human perception and model inference. This paper constructs a Multi-Annotated Foreground Audio Event Recognition (MAFAR) dataset, which comprises audio recordings labelled by 10 professional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

voltmeter00/mafar
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing