Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods
Fang-Chih Hsieh, Wei-Jaw Lee, Chun-Ping Wang, Hung-yi Lee, Hao-Wen Dong, Yi-Hsuan Yang

TL;DR
The ICME 2026 Grand Challenge on Academic Text-to-Music Generation introduces a standardized benchmark with open datasets, baselines, and evaluation methods to promote academic research in text-to-music systems.
Contribution
It establishes a fair, reproducible benchmark with open resources and evaluation metrics for academic text-to-music generation research.
Findings
Provides open-source baselines and evaluation code.
Introduces a novel Concept Coverage Score (CCS).
Defines a multi-stage evaluation process including objective and subjective metrics.
Abstract
This paper presents an overview and the technical framework of the ICME 2026 Grand Challenge on Academic Text-to-Music Generation (ATTM). Despite the rapid progress in text-to-music generation (TTM) systems, the field is currently dominated by models trained on massive proprietary datasets with industrial-scale computational resources, creating a significant barrier for academic research. To address this, the ATTM Challenge establishes a fair-play benchmark that requires participants to train generative models strictly from scratch using a standardized, CC-licensed subset of the MTG-Jamendo dataset containing only instrumental music. The challenge is divided into two tracks: the Efficiency Track (limited to 500M parameters) and the Performance Track (no parameter limit). Submissions are evaluated through a multi-stage process involving objective metrics, including Frechet Audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
