MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre Classification
Xiaokai Liu, Menghua Zhang

TL;DR
This paper introduces MATT, a novel multi-instance attention mechanism that improves long-tail music genre classification by effectively identifying rare genres in imbalanced datasets, outperforming existing methods.
Contribution
The paper proposes a new multi-instance attention mechanism (MATT) tailored for long-tail music genre classification, enhancing accuracy on imbalanced datasets.
Findings
MATT significantly outperforms state-of-the-art baselines.
The approach effectively identifies rare, long-tail genres.
Experimental results on large-scale datasets validate its superiority.
Abstract
Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field for identifying the long-tail, data-poor genre based on the related music audio segments, which is very prevalent in real-world scenarios. Most of the existing models are designed for class-balanced music datasets, resulting in poor performance in accuracy and generalization when identifying the music genres at the tail of the distribution. Inspired by the success of introducing Multi-instance Learning (MIL) in various classification tasks, we propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes. Specifically, we first construct the bag-level datasets by generating the album-artist pair bags. Second, we leverage neural networks to encode the music audio segments. Finally, under the guidance of a multi-instance attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies
