Music Transformer

Cheng-Zhi Anna Huang; Ashish Vaswani; Jakob Uszkoreit; Noam Shazeer,; Ian Simon; Curtis Hawthorne; Andrew M. Dai; Matthew D. Hoffman; Monica; Dinculescu; Douglas Eck

arXiv:1809.04281·cs.LG·December 13, 2018·49 cites

Music Transformer

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer,, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica, Dinculescu, Douglas Eck

PDF

Open Access 5 Repos 2 Models

TL;DR

This paper introduces a memory-efficient relative attention mechanism for Transformers, enabling the generation of long, coherent musical compositions and accompaniments, with state-of-the-art results on music datasets.

Contribution

It proposes a linear-memory relative attention algorithm for Transformers, improving long sequence modeling in music generation tasks.

Findings

01

Generated minute-long musical compositions with coherent structure

02

Achieved state-of-the-art results on Piano-e-Competition dataset

03

Demonstrated effective motif elaboration and accompaniment generation

Abstract

Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?