Multimodal Metric Learning for Tag-based Music Retrieval

Minz Won; Sergio Oramas; Oriol Nieto; Fabien Gouyon; Xavier Serra

arXiv:2010.16030·cs.IR·November 2, 2020·6 cites

Multimodal Metric Learning for Tag-based Music Retrieval

Minz Won, Sergio Oramas, Oriol Nieto, Fabien Gouyon, Xavier Serra

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal metric learning approach for tag-based music retrieval, leveraging triplet sampling, acoustic and cultural data, and domain-specific embeddings to improve retrieval accuracy and flexibility.

Contribution

It proposes a novel multimodal metric learning framework for music retrieval that overcomes fixed vocabulary limitations using advanced sampling and domain-specific embeddings.

Findings

01

Enhanced retrieval performance quantitatively

02

Improved qualitative retrieval results

03

Introduction of the MSD500 dataset with tags and user profiles

Abstract

Tag-based music retrieval is crucial to browse large-scale music libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, metric learning has already proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively, and qualitatively. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minzwon/tag-based-music-retrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis