Audio-to-symbolic Arrangement via Cross-modal Music Representation   Learning

Ziyu Wang; Dejing Xu; Gus Xia; Ying Shan

arXiv:2112.15110·cs.SD·February 23, 2022

Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning

Ziyu Wang, Dejing Xu, Gus Xia, Ying Shan

PDF

Open Access 1 Repo

TL;DR

This paper presents a cross-modal learning model that automatically derives piano scores from audio recordings of pop songs, integrating musical knowledge and audio features for improved arrangement quality.

Contribution

The study introduces a novel cross-modal representation-learning framework that combines audio and corrupted score data, enabling effective audio-to-symbolic piano arrangement without relying on ground truth scores during inference.

Findings

01

Model outperforms baselines in arrangement quality

02

Successfully captures key musical features from audio

03

Enables score generation solely from audio during inference

Abstract

Could we automatically derive the score of a piano accompaniment based on the audio of a pop song? This is the audio-to-symbolic arrangement problem we tackle in this paper. A good arrangement model should not only consider the audio content but also have prior knowledge of piano composition (so that the generation "sounds like" the audio and meanwhile maintains musicality). To this end, we contribute a cross-modal representation-learning model, which 1) extracts chord and melodic information from the audio, and 2) learns texture representation from both audio and a corrupted ground truth arrangement. We further introduce a tailored training strategy that gradually shifts the source of texture information from corrupted score to audio. In the end, the score-based texture posterior is reduced to a standard normal distribution, and only audio is needed for inference. Experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zzwaang/audio2midi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies