Masked Conditional Neural Networks for Audio Classification

Fady Medhat; David Chesmore; John Robinson

arXiv:1803.02421·stat.ML·March 26, 2019

Masked Conditional Neural Networks for Audio Classification

Fady Medhat, David Chesmore, John Robinson

PDF

1 Repo

TL;DR

This paper introduces the Masked Conditional Neural Network (MCLNN), a novel neural network architecture for audio classification that leverages temporal and spatial feature locality, achieving superior accuracy on music datasets.

Contribution

The paper proposes MCLNN, an extension of CLNN with a binary mask to automate feature exploration and preserve spatial locality, improving audio recognition performance.

Findings

01

Achieved competitive recognition accuracies on GTZAN and ISMIR2004 datasets.

02

Surpassed several state-of-the-art neural network architectures.

03

Outperformed hand-crafted feature methods.

Abstract

We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fadymedhat/MCLNN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.