MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core

Callie C. Liao; Duoduo Liao; Ellie L. Zhang

arXiv:2511.17323·cs.SD·November 24, 2025

MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core

Callie C. Liao, Duoduo Liao, Ellie L. Zhang

PDF

Open Access

TL;DR

MusicAIR introduces a novel multimodal AI framework that generates coherent, human-like music from lyrics, text, or images using an algorithm-driven core, reducing copyright risks and enhancing accessibility for musicians.

Contribution

The paper presents a new multimodal AI music generation framework with an algorithm-driven symbolic core, enabling copyright-safe, theory-compliant music creation from multiple input modalities.

Findings

01

Achieves an average key confidence of 85%, surpassing human composers at 79%.

02

Generates diverse, human-like music compositions.

03

Supports lyric-to-song, text-to-music, and image-to-music generation.

Abstract

Recent advances in generative AI have made music generation a prominent research focus. However, many neural-based models rely on large datasets, raising concerns about copyright infringement and high-performance costs. In contrast, we propose MusicAIR, an innovative multimodal AI music generation framework powered by a novel algorithm-driven symbolic music core, effectively mitigating copyright infringement risks. The music core algorithms connect critical lyrical and rhythmic information to automatically derive musical features, creating a complete, coherent melodic score solely from the lyrics. The MusicAIR framework facilitates music generation from lyrics, text, and images. The generated score adheres to established principles of music theory, lyrical structure, and rhythmic conventions. We developed Generate AI Music (GenAIM), a web tool using MusicAIR for lyric-to-song,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing