Aligning Generative Music AI with Human Preferences: Methods and Challenges
Dorien Herremans, Abhinaba Roy

TL;DR
This paper reviews methods for aligning generative music AI with human preferences, highlighting recent breakthroughs, challenges, and future directions to improve subjective quality and applicability.
Contribution
It systematically discusses recent preference alignment techniques in music AI, emphasizing their potential and challenges for better human-AI musical collaboration.
Findings
Preference learning improves subjective music quality
Diffusion-based frameworks enable multi-preference alignment
Inference-time optimization enhances personalized music generation
Abstract
Recent advances in generative AI for music have achieved remarkable fidelity and stylistic diversity, yet these systems often fail to align with nuanced human preferences due to the specific loss functions they use. This paper advocates for the systematic application of preference alignment techniques to music generation, addressing the fundamental gap between computational optimization and human musical appreciation. Drawing on recent breakthroughs including MusicRL's large-scale preference learning, multi-preference alignment frameworks like diffusion-based preference optimization in DiffRhythm+, and inference-time optimization techniques like Text2midi-InferAlign, we discuss how these techniques can address music's unique challenges: temporal coherence, harmonic consistency, and subjective quality assessment. We identify key research challenges including scalability to long-form…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
