TL;DR
This paper introduces a Capsule Neural Network approach for polyphonic sound event detection, demonstrating improved robustness and accuracy over traditional CNNs through extensive evaluations on public datasets.
Contribution
The paper applies CapsNets to polyphonic sound event detection, showing they outperform CNNs and state-of-the-art methods in this domain.
Findings
CapsNets outperform CNNs in polyphonic SED tasks.
The proposed method achieves state-of-the-art results.
Extensive evaluations validate the effectiveness of CapsNets.
Abstract
Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
