Music Artist Classification with Convolutional Recurrent Neural Networks
Zain Nasrullah, Yue Zhao

TL;DR
This paper demonstrates that using a convolutional recurrent neural network (CRNN) with varying audio clip lengths significantly improves music artist classification accuracy, highlighting the importance of temporal structure in spectrogram features.
Contribution
It introduces the impact of audio clip length on CRNN-based artist classification and empirically demonstrates improved performance over baselines on the artist20 dataset.
Findings
Best model achieves an F1 score of 0.937
Temporal structure enhances classification accuracy
Visualizations show meaningful feature clustering
Abstract
Previous attempts at music artist classification use frame level audio features which summarize frequency content within short intervals of time. Comparatively, more recent music information retrieval tasks take advantage of temporal structure in audio spectrograms using deep convolutional and recurrent models. This paper revisits artist classification with this new framework and empirically explores the impacts of incorporating temporal structure in the feature representation. To this end, an established classification architecture, a Convolutional Recurrent Neural Network (CRNN), is applied to the artist20 music artist identification dataset under a comprehensive set of conditions. These include audio clip length, which is a novel contribution in this work, and previously identified considerations such as dataset split and feature level. Our results improve upon baseline works, verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
