Practical and Reproducible Symbolic Music Generation by Large Language   Models with Structural Embeddings

Seungyeon Rhyu; Kichang Yang; Sungjun Cho; Jaehyeon Kim; Kyogu Lee,; and Moontae Lee

arXiv:2407.19900·cs.SD·July 30, 2024

Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

Seungyeon Rhyu, Kichang Yang, Sungjun Cho, Jaehyeon Kim, Kyogu Lee,, and Moontae Lee

PDF

Open Access

TL;DR

This paper introduces a MIDI-based music generation framework using large language models with structural embeddings that do not require domain-specific annotations, enhancing reproducibility and practical deployment.

Contribution

It proposes a novel approach with structural embeddings for symbolic music generation that avoids domain-specific annotations and improves reproducibility.

Findings

01

Structural embeddings can enhance specific musical aspects.

02

Multiple embedding configurations offer flexible control.

03

Open-source implementation facilitates practical use.

Abstract

Music generation introduces challenging complexities to large language models. Symbolic structures of music often include vertical harmonization as well as horizontal counterpoint, urging various adaptations and enhancements for large-scale Transformers. However, existing works share three major drawbacks: 1) their tokenization requires domain-specific annotations, such as bars and beats, that are typically missing in raw MIDI data; 2) the pure impact of enhancing token embedding methods is hardly examined without domain-specific annotations; and 3) existing works to overcome the aforementioned drawbacks, such as MuseNet, lack reproducibility. To tackle such limitations, we develop a MIDI-based music generation framework inspired by MuseNet, empirically studying two structural embeddings that do not rely on domain-specific annotations. We provide various metrics and insights that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies