TL;DR
This paper introduces the Masked Conditional Neural Network (MCLNN), a novel neural network architecture for audio classification that leverages temporal and spatial feature locality, achieving superior accuracy on music datasets.
Contribution
The paper proposes MCLNN, an extension of CLNN with a binary mask to automate feature exploration and preserve spatial locality, improving audio recognition performance.
Findings
Achieved competitive recognition accuracies on GTZAN and ISMIR2004 datasets.
Surpassed several state-of-the-art neural network architectures.
Outperformed hand-crafted feature methods.
Abstract
We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
