Melody Infilling with User-Provided Structural Context
Chih-Pin Tan, Alvin W.Y. Su, Yi-Hsuan Yang

TL;DR
This paper introduces a Transformer-based music score infilling model that incorporates user-provided structural information to generate higher-quality, style-consistent melodies by considering musical form and structure.
Contribution
It presents a novel structure-aware conditioning method with an attention-selecting module to improve music infilling by leveraging structural context.
Findings
Outperforms existing models in style and quality
Effectively utilizes structural information for better coherence
Generates melodies in pop style with higher subjective ratings
Abstract
This paper proposes a novel Transformer-based model for music score infilling, to generate a music passage that fills in the gap between given past and future contexts. While existing infilling approaches can generate a passage that connects smoothly locally with the given contexts, they do not take into account the musical form or structure of the music and may therefore generate overly smooth results. To address this issue, we propose a structure-aware conditioning approach that employs a novel attention-selecting module to supply user-provided structure-related information to the Transformer for infilling. With both objective and subjective evaluations, we show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality than the two existing structure-agnostic infilling models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Softmax · Byte Pair Encoding · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings
