A Track-Wise Ensemble Event Independent Network for Polyphonic Sound   Event Localization and Detection

Jinbo Hu; Yin Cao; Ming Wu; Qiuqiang Kong; Feiran Yang; Mark D.; Plumbley; Jun Yang

arXiv:2203.10228·cs.SD·March 22, 2022

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection

Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D., Plumbley, Jun Yang

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel ensemble neural network architecture with data augmentation for polyphonic sound event localization and detection, achieving top performance in a challenge.

Contribution

It extends previous work with conformer and dense blocks, introduces a track-wise ensemble approach to handle permutation issues, and employs diverse data augmentation strategies.

Findings

01

Achieved a location-dependent F-score of 0.699 in L3DAS22 challenge

02

Proposed a track-wise ensemble model to address permutation problems

03

Utilized multiple data augmentation chains for improved robustness

Abstract

Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a track-wise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies