MuLan: A Joint Embedding of Music Audio and Natural Language

Qingqing Huang; Aren Jansen; Joonseok Lee; Ravi Ganti; Judith Yue Li,; Daniel P. W. Ellis

arXiv:2208.12415·eess.AS·August 29, 2022·31 cites

MuLan: A Joint Embedding of Music Audio and Natural Language

Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li,, Daniel P. W. Ellis

PDF

Open Access 1 Repo 2 Models

TL;DR

MuLan is a novel joint embedding model that links music audio directly to natural language descriptions, enabling versatile zero-shot music tagging and cross-modal retrieval across diverse genres and text styles.

Contribution

Introduces MuLan, a joint audio-text embedding model trained on 44 million recordings, allowing flexible, zero-shot music understanding beyond traditional ontology-based systems.

Findings

01

Effective zero-shot music tagging demonstrated

02

Versatile cross-modal retrieval capabilities shown

03

Supports diverse music genres and natural language descriptions

Abstract

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedding model trained using 44 million music recordings (370K hours) and weakly-associated, free-form text annotations. Through its compatibility with a wide range of music genres and text styles (including conventional music tags), the resulting audio-text representation subsumes existing ontologies while graduating to true zero-shot functionalities. We demonstrate the versatility of the MuLan embeddings with a range of experiments including transfer learning, zero-shot music tagging, language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucidrains/musiclm-pytorch
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech Recognition and Synthesis