MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
Yutian Wang, Wanyin Yang, Zhenrong Dai, Yilong Zhang, Kun, Zhao, Hui Wang

TL;DR
MeloTrans is a novel music generation model that mimics human compositional habits by integrating motif development rules with neural networks, leading to more structured and diverse symbolic music output.
Contribution
The paper introduces MeloTrans, a text-to-music model that incorporates human-like motif development rules, and the POP909_M dataset with motif labels to enhance neural music generation.
Findings
MeloTrans outperforms existing models in musicality and diversity.
The POP909_M dataset enables better learning of musical motifs.
MeloTrans surpasses LLMs like ChatGPT-4 in music generation quality.
Abstract
At present, neural network models show powerful sequence prediction ability and are used in many automatic composition models. In comparison, the way humans compose music is very different from it. Composers usually start by creating musical motifs and then develop them into music through a series of rules. This process ensures that the music has a specific structure and changing pattern. However, it is difficult for neural network models to learn these composition rules from training data, which results in a lack of musicality and diversity in the generated music. This paper posits that integrating the learning capabilities of neural networks with human-derived knowledge may lead to better results. To archive this, we develop the POP909M dataset, the first to include labels for musical motifs and their variants, providing a basis for mimicking human compositional habits. Building…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Music Education Insights
