Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

Nima Hamidi; Mohsen Vahidzadeh; Stephen Baek

arXiv:1906.06746·cs.SD·June 18, 2019·1 cites

Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

Nima Hamidi, Mohsen Vahidzadeh, Stephen Baek

PDF

Open Access

TL;DR

This paper introduces a multi-scale embedded CNN architecture for music tagging that enhances feature transfer across layers, leading to improved classification performance.

Contribution

It proposes a novel CNN model with intermediate connections for better multi-scale feature transfer in music tagging tasks.

Findings

01

Significant performance improvement over existing methods

02

Effective transfer of low-level features to final layers

03

Enhanced multi-scale feature integration

Abstract

Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging. In this work, we propose implementing intermediate connections to the CNN architecture to facilitate the transfer of multi-scale/level knowledge between different layers. Our novel model for music tagging shows significant improvement in comparison to the proposed approaches in the literature, due to its ability to carry low-level timbral features to the last layer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies