Polyphonic Sound Event Detection Using Capsule Neural Network on   Multi-Type-Multi-Scale Time-Frequency Representation

Wangkai Jin; Junyu Liu; Jianfeng Ren; Xiangjun Peng

arXiv:2111.12869·cs.SD·November 29, 2021·1 cites

Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Wangkai Jin, Junyu Liu, Jianfeng Ren, Xiangjun Peng

PDF

Open Access

TL;DR

This paper introduces a novel polyphonic sound event detection framework that leverages multi-type-multi-scale time-frequency representations and capsule neural networks to improve detection accuracy of overlapping sound events.

Contribution

It proposes a new framework combining multiple TFRs and adaptive model fusion, utilizing capsule neural networks for enhanced polyphonic sound event detection.

Findings

01

Achieved 7% error rate reduction on TUT-SED 2016 dataset.

02

Demonstrated the effectiveness of multi-type-multi-scale TFRs.

03

Validated the superiority of capsule neural networks in this task.

Abstract

The challenges of polyphonic sound event detection (PSED) stem from the detection of multiple overlapping events in a time series. Recent efforts exploit Deep Neural Networks (DNNs) on Time-Frequency Representations (TFRs) of audio clips as model inputs to mitigate such issues. However, existing solutions often rely on a single type of TFR, which causes under-utilization of input features. To this end, we propose a novel PSED framework, which incorporates Multi-Type-Multi-Scale TFRs. Our key insight is that: TFRs, which are of different types or in different scales, can reveal acoustics patterns in a complementary manner, so that the overlapped events can be best extracted by combining different TFRs. Moreover, our framework design applies a novel approach, to adaptively fuse different models and TFRs symbiotically. Hence, the overall performance can be significantly improved. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies