TL;DR
This paper introduces an interpretable multimodal music auto-tagging framework that combines signal processing, deep learning, and NLP to improve transparency without sacrificing performance.
Contribution
It presents a novel approach that clusters multimodal features semantically and assigns weights via expectation maximization, enhancing interpretability in music auto-tagging.
Findings
Achieves competitive tagging accuracy
Provides deeper understanding of feature contributions
Enhances transparency of music auto-tagging models
Abstract
Music auto-tagging is essential for organizing and discovering music in extensive digital libraries. While foundation models achieve exceptional performance in this domain, their outputs often lack interpretability, limiting trust and usability for researchers and end-users alike. In this work, we present an interpretable framework for music auto-tagging that leverages groups of musically meaningful multimodal features, derived from signal processing, deep learning, ontology engineering, and natural language processing. To enhance interpretability, we cluster features semantically and employ an expectation maximization algorithm, assigning distinct weights to each group based on its contribution to the tagging process. Our method achieves competitive tagging performance while offering a deeper understanding of the decision-making process, paving the way for more transparent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsOntology
