TL;DR
This paper introduces Masked Conditional Neural Networks (MCLNN), a novel architecture tailored for sound recognition that leverages frequency band learning and systematic sparsity, achieving superior music genre classification performance.
Contribution
The paper presents MCLNN, a new neural network architecture that incorporates masking to focus on frequency bands, improving robustness and feature exploration in music genre classification.
Findings
MCLNN outperforms state-of-the-art CNNs on the Ballroom dataset.
MCLNN achieves competitive results with fewer parameters.
Masking enhances frequency-shift robustness.
Abstract
Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over a window of frames to preserve the inter-frame relation, and the MCLNN enforces a systematic sparseness over the network's links that mimics a filterbank-like behavior. The masking operation induces the network to learn in frequency bands, which decreases the network susceptibility to frequency-shifts in time-frequency representations. Additionally, the mask allows an exploration of a range of feature combinations concurrently analogous to the manual handcrafting of the optimum collection of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
