TL;DR
This paper introduces the Masked Conditional Neural Network (MCLNN), a novel neural network architecture that improves acoustic event recognition by focusing on frequency bands and automating feature learning, achieving competitive results.
Contribution
The paper proposes the MCLNN, which incorporates systematic weight masking to better capture frequency band information in sound recognition tasks, advancing neural network design for audio analysis.
Findings
MCLNN achieves competitive performance on environmental sound datasets.
The masking approach automates feature learning similar to filterbanks.
MCLNN outperforms some existing CNN models in sound recognition accuracy.
Abstract
Automatic feature extraction using neural networks has accomplished remarkable success for images, but for sound recognition, these models are usually modified to fit the nature of the multi-dimensional temporal representation of the audio signal in spectrograms. This may not efficiently harness the time-frequency representation of the signal. The ConditionaL Neural Network (CLNN) takes into consideration the interrelation between the temporal frames, and the Masked ConditionaL Neural Network (MCLNN) extends upon the CLNN by forcing a systematic sparseness over the network's weights using a binary mask. The masking allows the network to learn about frequency bands rather than bins, mimicking a filterbank used in signal transformations such as MFCC. Additionally, the Mask is designed to consider various combinations of features, which automates the feature hand-crafting process. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
