The Power of Fragmentation: A Hierarchical Transformer Model for   Structural Segmentation in Symbolic Music Generation

Guowei Wu; Shipei Liu; Xiaoya Fan

arXiv:2205.08579·cs.SD·July 12, 2022

The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation

Guowei Wu, Shipei Liu, Xiaoya Fan

PDF

Open Access

TL;DR

This paper introduces a hierarchical Transformer model for symbolic music generation that captures multi-scale structural elements like sections and chords, leading to more realistic and stylistically consistent music.

Contribution

It proposes a novel hierarchical Transformer with a Fragment Scope Localization layer and multi-scale attention for better structural understanding in music generation.

Findings

01

Outperforms current state-of-the-art models in quantitative metrics.

02

Produces more realistic and melody-reuse music according to visual evaluation.

03

Achieves consistent style across generated sections with Music Style Normalization.

Abstract

Symbolic Music Generation relies on the contextual representation capabilities of the generative model, where the most prevalent approach is the Transformer-based model. The learning of musical context is also related to the structural elements in music, i.e. intro, verse, and chorus, which are currently overlooked by the research community. In this paper, we propose a hierarchical Transformer model to learn multi-scale contexts in music. In the encoding phase, we first designed a Fragment Scope Localization layer to syncopate the music into chords and sections. Then, we use a multi-scale attention mechanism to learn note-, chord-, and section-level contexts. In the decoding phase, we proposed a hierarchical Transformer model that uses fine-decoders to generate sections in parallel and a coarse-decoder to decode the combined music. We also designed a Music Style Normalization layer to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Adam · VERtex Similarity Embeddings · Byte Pair Encoding · Residual Connection · Label Smoothing · Absolute Position Encodings