Masked Conditional Neural Networks for Automatic Sound Events   Recognition

Fady Medhat; David Chesmore; John Robinson

arXiv:1802.05792·cs.LG·April 30, 2019

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Fady Medhat, David Chesmore, John Robinson

PDF

1 Repo

TL;DR

This paper introduces the Masked Conditional Neural Network (MCLNN), a novel neural architecture that improves environmental sound recognition by focusing on frequency bands and feature combinations, achieving competitive results with fewer parameters.

Contribution

The work presents the MCLNN, which enforces systematic sparsity and frequency shift invariance, enabling more effective sound recognition compared to traditional CNNs.

Findings

01

MCLNN achieved competitive accuracy on ESC-10 and ESC-50 datasets.

02

The model used only 12% of the parameters of state-of-the-art CNNs.

03

No data augmentation was needed for strong performance.

Abstract

Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN considers the inter-frame relationship, and the MCLNN enforces a systematic sparseness over the network's links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC-10 and ESC-50 datasets. MCLNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fadymedhat/MCLNN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.