Impact of temporal resolution on convolutional recurrent networks for   audio tagging and sound event detection

Wim Boes; Hugo Van hamme

arXiv:2209.12843·eess.AS·September 28, 2022

Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection

Wim Boes, Hugo Van hamme

PDF

Open Access

TL;DR

This paper investigates how varying the temporal resolution in convolutional recurrent neural networks affects their performance in audio tagging and sound event detection, providing insights for optimizing design choices.

Contribution

It offers a comprehensive analysis of the impact of temporal resolution adjustments on neural network performance across different sound recognition scenarios.

Findings

01

Optimal temporal resolution varies with recognition scenario

02

Adjusting pooling operations significantly influences localization accuracy

03

Performance improvements depend on specific evaluation metrics

Abstract

Many state-of-the-art systems for audio tagging and sound event detection employ convolutional recurrent neural architectures. Typically, they are trained in a mean teacher setting to deal with the heterogeneous annotation of the available data. In this work, we present a thorough analysis of how changing the temporal resolution of these convolutional recurrent neural networks - which can be done by simply adapting their pooling operations - impacts their performance. By using a variety of evaluation metrics, we investigate the effects of adapting this design parameter under several sound recognition scenarios involving different needs in terms of temporal localization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies