MusCaps: Generating Captions for Music Audio

Ilaria Manco; Emmanouil Benetos; Elio Quinton; Gyorgy Fazekas

arXiv:2104.11984·cs.SD·December 9, 2021

MusCaps: Generating Captions for Music Audio

Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

PDF

1 Repo

TL;DR

This paper introduces MusCaps, the first deep learning model for generating natural language descriptions of music audio, outperforming non-music captioning baselines and emphasizing pre-training of audio encoders.

Contribution

It presents a novel audio captioning model for music, combining multimodal encoding and pre-training, shifting from classification to descriptive approaches in music information retrieval.

Findings

01

Pre-training of audio encoder significantly improves caption quality.

02

MusCaps outperforms non-music captioning baselines.

03

Design choices like modality fusion and attention have marginal impact.

Abstract

Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Current approaches to high-level music description typically make use of classification models, such as in auto-tagging or genre and mood classification. In this work, we propose to address music description via audio captioning, defined as the task of generating a natural language description of music audio content in a human-like manner. To this end, we present the first music audio captioning model, MusCaps, consisting of an encoder-decoder with temporal attention. Our method combines convolutional and recurrent neural network architectures to jointly process audio-text inputs through a multimodal encoder and leverages pre-training on audio data to obtain representations that effectively capture and summarise musical features in the input. Evaluation of the generated captions through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ilaria-manco/muscaps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.