HeartMuLa: A Family of Open Sourced Music Foundation Models
Dongchao Yang, Yuxin Xie, Yuguo Yin, Zheyu Wang, Xiaoyu Yi, Gongxi Zhu, Xiaolong Weng, Zihan Xiong, Yingzhe Ma, Dading Cong, Jingliang Liu, Zihang Huang, Jinghan Ru, Rongjie Huang, Haoran Wan, Peixu Wang, Kuoxi Yu, Helin Wang, Liming Liang, Xianwei Zhuang, Yuanyuan Wang

TL;DR
HeartMuLa introduces a comprehensive suite of open-source music foundation models that enable advanced music understanding and generation with user-controllable features and high fidelity, scalable to 7B parameters.
Contribution
This work presents a novel family of open-source models for music understanding and generation, including audio-text alignment, lyric recognition, music coding, and song synthesis, scalable to 7B parameters.
Findings
HeartMuLa models achieve high-quality music generation.
Scaling to 7B parameters significantly improves performance.
Open-source models serve as strong baselines for future research.
Abstract
We present a family of open-source Music Foundation Models designed to advance large-scale music understanding and generation across diverse tasks and modalities. Our framework consists of four major components: (1) HeartCLAP, an audio-text alignment model; (2) HeartTranscriptor, a robust lyric recognition model optimized for real-world music scenarios; and (3) HeartCodec, a low-frame-rate (12.5 Hz) yet high-fidelity music codec tokenizer that captures long-range musical structure while preserving fine-grained acoustic details and enabling efficient autoregressive modeling; (4) HeartMuLa, an LLM-based song generation model capable of synthesizing high-fidelity music under rich, user-controllable conditions (e.g., textual style descriptions, lyrics, and reference audio). In addition, it provides two specialized modes: (i) fine-grained musical attribute control, which allows users to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗HeartMuLa/HeartMuLa-oss-3B-happy-new-yearmodel· 2.9k dl· ♡ 252.9k dl♡ 25
- 🤗HeartMuLa/HeartTranscriptor-ossmodel· 639 dl· ♡ 19639 dl♡ 19
- 🤗HeartMuLa/HeartMuLa-oss-3Bmodel· 1.6k dl· ♡ 2531.6k dl♡ 253
- 🤗HeartMuLa/HeartMuLaGenmodel· ♡ 31♡ 31
- 🤗backups2/HeartMuLaGenmodel
- 🤗backups2/HeartMuLa-oss-3Bmodel
- 🤗backups2/HeartCodec-ossmodel· 13 dl· ♡ 113 dl♡ 1
- 🤗backups2/HeartTranscriptor-ossmodel
- 🤗Ademola265/HeartMuLa-oss-3Bmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗Ademola265/HeartCodec-ossmodel· 34 dl· ♡ 234 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Artificial Intelligence in Games
