Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods

Fang-Chih Hsieh; Wei-Jaw Lee; Chun-Ping Wang; Hung-yi Lee; Hao-Wen Dong; Yi-Hsuan Yang

arXiv:2605.21538·cs.SD·May 22, 2026

Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods

Fang-Chih Hsieh, Wei-Jaw Lee, Chun-Ping Wang, Hung-yi Lee, Hao-Wen Dong, Yi-Hsuan Yang

PDF

TL;DR

The ICME 2026 Grand Challenge on Academic Text-to-Music Generation introduces a standardized benchmark with open datasets, baselines, and evaluation methods to promote academic research in text-to-music systems.

Contribution

It establishes a fair, reproducible benchmark with open resources and evaluation metrics for academic text-to-music generation research.

Findings

01

Provides open-source baselines and evaluation code.

02

Introduces a novel Concept Coverage Score (CCS).

03

Defines a multi-stage evaluation process including objective and subjective metrics.

Abstract

This paper presents an overview and the technical framework of the ICME 2026 Grand Challenge on Academic Text-to-Music Generation (ATTM). Despite the rapid progress in text-to-music generation (TTM) systems, the field is currently dominated by models trained on massive proprietary datasets with industrial-scale computational resources, creating a significant barrier for academic research. To address this, the ATTM Challenge establishes a fair-play benchmark that requires participants to train generative models strictly from scratch using a standardized, CC-licensed subset of the MTG-Jamendo dataset containing only instrumental music. The challenge is divided into two tracks: the Efficiency Track (limited to 500M parameters) and the Performance Track (no parameter limit). Submissions are evaluated through a multi-stage process involving objective metrics, including Frechet Audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.