Toward Interpretable Music Tagging with Self-Attention

Minz Won; Sanghyuk Chun; Xavier Serra

arXiv:1906.04972·cs.SD·June 13, 2019·35 cites

Toward Interpretable Music Tagging with Self-Attention

Minz Won, Sanghyuk Chun, Xavier Serra

PDF

Open Access 2 Repos

TL;DR

This paper introduces a self-attention based deep learning model for music tagging that improves interpretability while maintaining competitive performance, validated on standard datasets and visualized through heat maps.

Contribution

The paper presents a novel self-attention based architecture combining convolutional layers and Transformer encoders for music tagging, enhancing interpretability over traditional models.

Findings

01

Competitive results on MagnaTagATune and Million Song Dataset

02

Model's interpretability demonstrated via heat map visualizations

03

Outperforms fully convolutional and recurrent neural network approaches

Abstract

Self-attention is an attention mechanism that learns a representation by relating different positions in the sequence. The transformer, which is a sequence model solely based on self-attention, and its variants achieved state-of-the-art results in many natural language processing tasks. Since music composes its semantics based on the relations between components in sparse positions, adopting the self-attention mechanism to solve music information retrieval (MIR) problems can be beneficial. Hence, we propose a self-attention based deep sequence model for music tagging. The proposed architecture consists of shallow convolutional layers followed by stacked Transformer encoders. Compared to conventional approaches using fully convolutional or recurrent neural networks, our model is more interpretable while reporting competitive results. We validate the performance of our model with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies