Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning
Ziyu Wang, Dejing Xu, Gus Xia, Ying Shan

TL;DR
This paper presents a cross-modal learning model that automatically derives piano scores from audio recordings of pop songs, integrating musical knowledge and audio features for improved arrangement quality.
Contribution
The study introduces a novel cross-modal representation-learning framework that combines audio and corrupted score data, enabling effective audio-to-symbolic piano arrangement without relying on ground truth scores during inference.
Findings
Model outperforms baselines in arrangement quality
Successfully captures key musical features from audio
Enables score generation solely from audio during inference
Abstract
Could we automatically derive the score of a piano accompaniment based on the audio of a pop song? This is the audio-to-symbolic arrangement problem we tackle in this paper. A good arrangement model should not only consider the audio content but also have prior knowledge of piano composition (so that the generation "sounds like" the audio and meanwhile maintains musicality). To this end, we contribute a cross-modal representation-learning model, which 1) extracts chord and melodic information from the audio, and 2) learns texture representation from both audio and a corrupted ground truth arrangement. We further introduce a tailored training strategy that gradually shifts the source of texture information from corrupted score to audio. In the end, the score-based texture posterior is reduced to a standard normal distribution, and only audio is needed for inference. Experiments show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
