Graph Representation learning for Audio & Music genre Classification
Shubham Dokania, Vasudev Singh

TL;DR
This paper explores the use of graph neural networks combined with CNNs for music genre classification, achieving state-of-the-art results on multiple datasets and providing insights into model interpretability.
Contribution
It introduces a novel combination of CNN and GNN for audio classification and discusses the role of Siamese networks in learning similarity weights.
Findings
Achieved state-of-the-art accuracy on GTZAN and AudioSet datasets.
Demonstrated the effectiveness of GNNs in capturing genre-specific features.
Provided visual analysis of model focus on spectrograms.
Abstract
Music genre is arguably one of the most important and discriminative information for music and audio content. Visual representation based approaches have been explored on spectrograms for music genre classification. However, lack of quality data and augmentation techniques makes it difficult to employ deep learning techniques successfully. We discuss the application of graph neural networks on such task due to their strong inductive bias, and show that combination of CNN and GNN is able to achieve state-of-the-art results on GTZAN, and AudioSet (Imbalanced Music) datasets. We also discuss the role of Siamese Neural Networks as an analogous to GNN for learning edge similarity weights. Furthermore, we also perform visual analysis to understand the field-of-view of our model into the spectrogram based on genre labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
