MuDiT & MuSiT: Alignment with Colloquial Expression in   Description-to-Song Generation

Zihao Wang; Haoxuan Liu; Jiaxing Yu; Tao Zhang; Yan Liu; Kejun Zhang

arXiv:2407.03188·cs.SD·July 12, 2024

MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

Zihao Wang, Haoxuan Liu, Jiaxing Yu, Tao Zhang, Yan Liu, Kejun Zhang

PDF

Open Access

TL;DR

This paper introduces a new task and dataset for aligning colloquial human descriptions with AI-generated music, proposing a novel end-to-end framework that improves human-AI musical collaboration.

Contribution

It presents the Colloquial Description-to-Song Generation task, a new dataset CaiMD, and the MuDiT/MuSiT framework for effective alignment of colloquial language with musical output.

Findings

01

CaiMD dataset offers diverse, high-quality colloquial music descriptions.

02

MuDiT/MuSiT achieves effective cross-modal alignment and cohesive music generation.

03

Framework enhances human-AI collaboration in creative music processes.

Abstract

Amid the rising intersection of generative AI and human artistic processes, this study probes the critical yet less-explored terrain of alignment in human-centric automatic song composition. We propose a novel task of Colloquial Description-to-Song Generation, which focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model, with the ultimate goal of creating songs that accurately satisfy human auditory expectations and structurally align with musical norms. Current datasets are limited due to their narrow descriptive scope, semantic gaps and inaccuracies. To overcome data scarcity in this domain, we present the Caichong Music Dataset (CaiMD). CaiMD is manually annotated by both professional musicians and amateurs, offering diverse perspectives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsALIGN