Multi-Task Music Representation Learning from Multi-Label Embeddings
Alexander Schindler, Peter Knees

TL;DR
This paper introduces a new method for music representation learning that leverages multi-tag annotations and latent semantic indexing to improve triplet selection in a multi-task setting, enhancing multimedia retrieval.
Contribution
It proposes a novel triplet selection strategy using multi-tag annotations and latent semantic indexing for multi-task music representation learning.
Findings
Effective triplet selection improves representation quality.
Multi-tag annotations enable multi-task learning.
Enhanced retrieval performance demonstrated on large datasets.
Abstract
This paper presents a novel approach to music representation learning. Triplet loss based networks have become popular for representation learning in various multimedia retrieval domains. Yet, one of the most crucial parts of this approach is the appropriate selection of triplets, which is indispensable, considering that the number of possible triplets grows cubically. We present an approach to harness multi-tag annotations for triplet selection, by using Latent Semantic Indexing to project the tags onto a high-dimensional space. From this we estimate tag-relatedness to select hard triplets. The approach is evaluated in a multi-task scenario for which we introduce four large multi-tag annotations for the Million Song Dataset for the music properties genres, styles, moods, and themes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
