Toward Universal Text-to-Music Retrieval

SeungHeon Doh; Minz Won; Keunwoo Choi; Juhan Nam

arXiv:2211.14558·cs.IR·November 29, 2022

Toward Universal Text-to-Music Retrieval

SeungHeon Doh, Minz Won, Keunwoo Choi, Juhan Nam

PDF

Open Access 3 Repos

TL;DR

This paper proposes design strategies for a universal text-to-music retrieval system capable of handling various input types, achieving comparable performance across different query formats and generalizing to multiple music classification tasks.

Contribution

It introduces a benchmark and design choices that enable a single system to effectively process diverse text inputs for music retrieval, surpassing previous single-query-type limitations.

Findings

01

Achieves comparable retrieval performance for tag- and sentence-level inputs.

02

Generalizes to 9 downstream music classification tasks.

03

Provides code and demo online for reproducibility.

Abstract

This paper introduces effective design choices for text-to-music retrieval systems. An ideal text-based retrieval system would support various input queries such as pre-defined tags, unseen tags, and sentence-level descriptions. In reality, most previous works mainly focused on a single query type (tag or sentence) which may not generalize to another input type. Hence, we review recent text-based music retrieval systems using our proposed benchmark in two main aspects: input text representation and training objectives. Our findings enable a universal text-to-music retrieval system that achieves comparable retrieval performances in both tag- and sentence-level inputs. Furthermore, the proposed multimodal representation generalizes to 9 different downstream music classification tasks. We present the code and demo online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Music Technology and Sound Studies