Music Genre Classification using Large Language Models
Mohamed El Amine Meguenani, Alceu de Souza Britto Jr. and, Alessandro Lameiras Koerich

TL;DR
This paper explores the use of large language models and transformer architectures for music genre classification, demonstrating that models like AST achieve high accuracy even in zero-shot scenarios.
Contribution
It introduces a novel approach combining LLMs with audio processing for genre classification and compares various models, highlighting the superior performance of transformer-based architectures.
Findings
AST model achieves 85.5% accuracy
Transformer-based models outperform CNNs and traditional methods
Zero-shot classification capability demonstrated
Abstract
This paper exploits the zero-shot capabilities of pre-trained large language models (LLMs) for music genre classification. The proposed approach splits audio signals into 20 ms chunks and processes them through convolutional feature encoders, a transformer encoder, and additional layers for coding audio units and generating feature vectors. The extracted feature vectors are used to train a classification head. During inference, predictions on individual chunks are aggregated for a final genre classification. We conducted a comprehensive comparison of LLMs, including WavLM, HuBERT, and wav2vec 2.0, with traditional deep learning architectures like 1D and 2D convolutional neural networks (CNNs) and the audio spectrogram transformer (AST). Our findings demonstrate the superior performance of the AST model, achieving an overall accuracy of 85.5%, surpassing all other models evaluated. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies
