Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction
Jun-You Wang, Li Su

TL;DR
This paper introduces a BERT-like model for symbolic music understanding, utilizing novel token denoising and pianoroll prediction objectives to improve performance across diverse musical tasks.
Contribution
It presents two innovative pre-training objectives tailored for symbolic music, enhancing BERT's ability to learn musical features and knowledge.
Findings
Achieves competitive results on 12 downstream music tasks.
Pre-training objectives improve model's understanding of pitch and musical structure.
Demonstrates effectiveness of token denoising and pianoroll prediction in music modeling.
Abstract
We propose a pre-trained BERT-like model for symbolic music understanding that achieves competitive performance across a wide range of downstream tasks. To achieve this target, we design two novel pre-training objectives, namely token correction and pianoroll prediction. First, we sample a portion of note tokens and corrupt them with a limited amount of noise, and then train the model to denoise the corrupted tokens; second, we also train the model to predict bar-level and local pianoroll-derived representations from the corrupted note tokens. We argue that these objectives guide the model to better learn specific musical knowledge such as pitch intervals. For evaluation, we propose a benchmark that incorporates 12 downstream tasks ranging from chord estimation to symbolic genre classification. Results confirm the effectiveness of the proposed pre-training objectives on downstream tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
