Multi-scale Embedded CNN for Music Tagging (MsE-CNN)
Nima Hamidi, Mohsen Vahidzadeh, Stephen Baek

TL;DR
This paper introduces a multi-scale embedded CNN architecture for music tagging that enhances feature transfer across layers, leading to improved classification performance.
Contribution
It proposes a novel CNN model with intermediate connections for better multi-scale feature transfer in music tagging tasks.
Findings
Significant performance improvement over existing methods
Effective transfer of low-level features to final layers
Enhanced multi-scale feature integration
Abstract
Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging. In this work, we propose implementing intermediate connections to the CNN architecture to facilitate the transfer of multi-scale/level knowledge between different layers. Our novel model for music tagging shows significant improvement in comparison to the proposed approaches in the literature, due to its ability to carry low-level timbral features to the last layer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
