Multi-Genre Music Transformer -- Composing Full Length Musical Piece
Abhinav Kaushal Keshari

TL;DR
This paper introduces a Multi-Genre Transformer model that generates diverse, full-length musical compositions across genres, leveraging compound words for detailed musical event representation and achieving faster training times.
Contribution
The paper presents a novel Multi-Genre Transformer that models genre and form in music generation, improving diversity and training efficiency over previous methods.
Findings
Generated music is diverse and comparable to original tracks.
Model trains 2-5 times faster than existing models.
Effective representation of musical events using compound words.
Abstract
In the task of generating music, the art factor plays a big role and is a great challenge for AI. Previous work involving adversarial training to produce new music pieces and modeling the compatibility of variety in music (beats, tempo, musical stems) demonstrated great examples of learning this task. Though this was limited to generating mashups or learning features from tempo and key distributions to produce similar patterns. Compound Word Transformer was able to represent music generation task as a sequence generation challenge involving musical events defined by compound words. These musical events give a more accurate description of notes progression, chord change, harmony and the art factor. The objective of the project is to implement a Multi-Genre Transformer which learns to produce music pieces through more adaptive learning process involving more challenging task where genres…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Linear Layer · Dropout · Residual Connection
