# A Survey on Evaluation Metrics for Music Generation

**Authors:** Faria Binte Kader, Santu Karmaker

arXiv: 2509.00051 · 2025-09-03

## TL;DR

This survey reviews current evaluation metrics for music generation, highlighting their limitations and proposing future research directions to develop a comprehensive framework that better aligns with human perception.

## Contribution

It provides a detailed taxonomy of evaluation metrics for audio and symbolic music, critically analyzes their limitations, and suggests future research directions.

## Key findings

- Current metrics often poorly correlate with human perception
- Major limitations include cultural bias and lack of standardization
- Proposes future directions for comprehensive evaluation frameworks

## Abstract

Despite significant advancements in music generation systems, the methodologies for evaluating generated music have not progressed as expected due to the complex nature of music, with aspects such as structure, coherence, creativity, and emotional expressiveness. In this paper, we shed light on this research gap, introducing a detailed taxonomy for evaluation metrics for both audio and symbolic music representations. We include a critical review identifying major limitations in current evaluation methodologies which includes poor correlation between objective metrics and human perception, cross-cultural bias, and lack of standardization that hinders cross-model comparisons. Addressing these gaps, we further propose future research directions towards building a comprehensive evaluation framework for music generation evaluation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00051/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00051/full.md

## References

122 references — full list in the complete paper: https://tomesphere.com/paper/2509.00051/full.md

---
Source: https://tomesphere.com/paper/2509.00051