Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes
Zixun Guo, Simon Dixon

TL;DR
Moonbeam is a transformer-based MIDI foundation model that captures both absolute and relative musical attributes, enabling improved music understanding and generation through novel tokenization and attention mechanisms.
Contribution
The paper introduces Moonbeam, a new MIDI foundation model with a novel tokenization method and Multidimensional Relative Attention, enhancing music understanding and generation capabilities.
Findings
Outperforms existing models in classification accuracy and F1 scores.
Achieves superior conditional music generation quality.
Demonstrates effective capturing of both absolute and relative music attributes.
Abstract
Moonbeam is a transformer-based foundation model for symbolic music, pretrained on a large and diverse collection of MIDI data totaling 81.6K hours of music and 18 billion tokens. Moonbeam incorporates music-domain inductive biases by capturing both absolute and relative musical attributes through the introduction of a novel domain-knowledge-inspired tokenization method and Multidimensional Relative Attention (MRA), which captures relative music information without additional trainable parameters. Leveraging the pretrained Moonbeam, we propose 2 finetuning architectures with full anticipatory capabilities, targeting 2 categories of downstream tasks: symbolic music understanding and conditional music generation (including music infilling). Our model outperforms other large-scale pretrained music models in most cases in terms of accuracy and F1 score across 3 downstream music…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
MethodsSoftmax · Attention Is All You Need
