Generating High-quality Symbolic Music Using Fine-grained Discriminators
Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang,, Chenggang Yan, Jian Yang, Yuankai Qi

TL;DR
This paper introduces a novel symbolic music generation approach that employs separate discriminators for melody and rhythm, enhancing the quality of generated music by explicitly modeling these musical dimensions.
Contribution
The work proposes decoupling melody and rhythm with dedicated discriminators, improving the generator’s ability to produce more human-like music with fine-grained control.
Findings
Outperforms state-of-the-art methods on POP909 benchmark
Improves both objective and subjective music quality metrics
Demonstrates effectiveness of fine-grained discriminators in music generation
Abstract
Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from music, and design corresponding fine-grained discriminators to tackle the aforementioned issues. Specifically, equipped with a pitch augmentation strategy, the melody discriminator discerns the melody variations presented by the generated samples. By contrast, the rhythm discriminator, enhanced with bar-level relative positional encoding, focuses on the velocity of generated notes. Such a design allows the generator to be more explicitly aware of which aspects should be adjusted in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Image Processing and 3D Reconstruction
MethodsAttentive Walk-Aggregating Graph Neural Network
