Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach
Shih-Lun Wu, Yi-Hsuan Yang

TL;DR
This paper introduces a two-stage Transformer framework for generating expressive piano performances by first composing a lead sheet and then embellishing it, significantly improving musical structure and expressiveness.
Contribution
The novel two-stage approach enables better long-range structure in piano generation and allows pretraining on non-piano data, enhancing performance quality.
Findings
Reduces the gap in musical structure compared to real performances by half.
Improves richness and coherence of generated music.
Effective two-stage framework for expressive piano performance generation.
Abstract
Even with strong sequence models like Transformers, generating expressive piano performances with long-range musical structures remains challenging. Meanwhile, methods to compose well-structured melodies or lead sheets (melody + chords), i.e., simpler forms of music, gained more success. Observing the above, we devise a two-stage Transformer-based framework that Composes a lead sheet first, and then Embellishes it with accompaniment and expressive touches. Such a factorization also enables pretraining on non-piano data. Our objective and subjective experiments show that Compose & Embellish shrinks the gap in structureness between a current state of the art and real performances by half, and improves other musical aspects such as richness and coherence as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
