A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
Mingshuo Ding, Yinghao Ma

TL;DR
This paper introduces a transformer-based autoencoder with MIDI augmentation for classifying whether a music excerpt is AI-generated or human-composed, achieving competitive results with a compact model.
Contribution
It proposes a novel masked language model approach using ALBERT for composer classification, with data augmentation and refined loss to improve performance on small datasets.
Findings
Achieved 3rd place in CSMT 2020 data challenge
Reduced model parameters while maintaining accuracy
Demonstrated effectiveness of data augmentation in small datasets
Abstract
Despite recent achievements of deep learning automatic music generation algorithms, few approaches have been proposed to evaluate whether a single-track music excerpt is composed by automatons or Homo sapiens. To tackle this problem, we apply a masked language model based on ALBERT for composers classification. The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis, and which is trained with only AI-composed single-track MIDI. In this paper, the amount of parameters is reduced, two methods on data augmentation are proposed as well as a refined loss function to prevent overfitting. The experiment results show our model ranks in all the teams in the data challenge in CSMT(2020). Furthermore, this inspiring method could be spread to other music information retrieval tasks that are based on a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
MethodsLinear Layer · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · LAMB · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Attention Is All You Need
