SinTra: Learning an inspiration model from a single multi-track music segment
Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng

TL;DR
SinTra is a novel auto-regressive model that learns to generate coherent, multi-instrument polyphonic music from a single segment using a pyramid Transformer-XL architecture and a new pitch-group representation.
Contribution
The paper introduces SinTra, a single-segment learning framework with a pyramid Transformer-XL and a pitch-group representation for high-quality multi-instrument music generation.
Findings
SinTra outperforms Music Transformer in learning from a single music segment.
The pyramid structure reduces overly-fragmented notes.
The model effectively captures musical structure and inter-track relationships.
Abstract
In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training strategy, can learn both the musical structure and the relative positional relationship between notes of the single training music segment. Additionally, for maintaining the inter-track correlation, we use the convolution operation to process multi-track music, and when decoding, the tracks are independent to each other to prevent interference. We evaluate SinTra with both subjective study and objective metrics. The comparison results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dropout · *Communicated@Fast*How Do I Communicate to Expedia? · Label Smoothing · Adaptive Input Representations · Cosine Annealing · Adam · Multi-Head Attention
