MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
Callie C. Liao, Duoduo Liao, Ellie L. Zhang

TL;DR
MusicAIR introduces a novel multimodal AI framework that generates coherent, human-like music from lyrics, text, or images using an algorithm-driven core, reducing copyright risks and enhancing accessibility for musicians.
Contribution
The paper presents a new multimodal AI music generation framework with an algorithm-driven symbolic core, enabling copyright-safe, theory-compliant music creation from multiple input modalities.
Findings
Achieves an average key confidence of 85%, surpassing human composers at 79%.
Generates diverse, human-like music compositions.
Supports lyric-to-song, text-to-music, and image-to-music generation.
Abstract
Recent advances in generative AI have made music generation a prominent research focus. However, many neural-based models rely on large datasets, raising concerns about copyright infringement and high-performance costs. In contrast, we propose MusicAIR, an innovative multimodal AI music generation framework powered by a novel algorithm-driven symbolic music core, effectively mitigating copyright infringement risks. The music core algorithms connect critical lyrical and rhythmic information to automatically derive musical features, creating a complete, coherent melodic score solely from the lyrics. The MusicAIR framework facilitates music generation from lyrics, text, and images. The generated score adheres to established principles of music theory, lyrical structure, and rhythmic conventions. We developed Generate AI Music (GenAIM), a web tool using MusicAIR for lyric-to-song,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
