Museformer: Transformer with Fine- and Coarse-Grained Attention for   Music Generation

Botao Yu; Peiling Lu; Rui Wang; Wei Hu; Xu Tan; Wei Ye; Shikun Zhang,; Tao Qin; Tie-Yan Liu

arXiv:2210.10349·cs.SD·November 1, 2022·20 cites

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang,, Tao Qin, Tie-Yan Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

Museformer introduces a novel Transformer architecture with combined fine- and coarse-grained attention mechanisms, enabling efficient modeling of long music sequences and capturing musical structures more effectively.

Contribution

The paper proposes Museformer, a Transformer variant with dual attention mechanisms that improve long-sequence music generation and structural modeling.

Findings

01

Can model over 3 times longer music sequences than full-attention models

02

Generates high-quality music with better structural coherence

03

Outperforms existing models in objective and subjective evaluations

Abstract

Symbolic music generation aims to generate music scores automatically. A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e.g., over 10,000 tokens), and the existing models have shortcomings in generating musical repetition structures. In this paper, we propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures (e.g., the previous 1st, 2nd, 4th and 8th bars, selected via similarity statistics); with the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/muzic
pytorchOfficial

Videos

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation· slideslive

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Adam · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding