Melody transcription via generative pre-training

Chris Donahue; John Thickstun; Percy Liang

arXiv:2212.01884·cs.SD·December 6, 2022

Melody transcription via generative pre-training

Chris Donahue, John Thickstun, Percy Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel melody transcription method leveraging generative pre-training with broad music representations and a new dataset, significantly improving accuracy and enabling direct transcription of lead sheets from audio.

Contribution

It presents a new approach combining generative pre-training and a large crowdsourced dataset to enhance melody transcription across diverse musical styles and instruments.

Findings

01

20% performance improvement over conventional methods

02

77% stronger performance with the new dataset

03

Enables direct transcription of lead sheets from audio

Abstract

Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$ % relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chrisdonahue/sheetsage
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsDense Connections · Residual Connection · Layer Normalization · Dilated Convolution · Position-Wise Feed-Forward Layer · VQ-VAE · Convolution · Jukebox