Music Transformer
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer,, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica, Dinculescu, Douglas Eck

TL;DR
This paper introduces a memory-efficient relative attention mechanism for Transformers, enabling the generation of long, coherent musical compositions and accompaniments, with state-of-the-art results on music datasets.
Contribution
It proposes a linear-memory relative attention algorithm for Transformers, improving long sequence modeling in music generation tasks.
Findings
Generated minute-long musical compositions with coherent structure
Achieved state-of-the-art results on Piano-e-Competition dataset
Demonstrated effective motif elaboration and accompaniment generation
Abstract
Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?
