Do Captioning Metrics Reflect Music Semantic Alignment?

Jinwoo Lee; Kyogu Lee

arXiv:2411.11692·cs.SD·November 19, 2024

Do Captioning Metrics Reflect Music Semantic Alignment?

Jinwoo Lee, Kyogu Lee

PDF

Open Access

TL;DR

This paper critically examines whether traditional language generation metrics like BLEU and ROUGE are suitable for evaluating music captioning, highlighting their poor correlation with human judgments and vulnerabilities to syntactic variations.

Contribution

It reveals the inadequacy of existing metrics for music captioning evaluation and advocates for developing more appropriate, semantically aligned assessment methods.

Findings

01

Traditional metrics do not correlate well with human judgments.

02

Existing metrics are vulnerable to syntactic changes.

03

A need for reevaluating evaluation standards in music captioning.

Abstract

Music captioning has emerged as a promising task, fueled by the advent of advanced language generation models. However, the evaluation of music captioning relies heavily on traditional metrics such as BLEU, METEOR, and ROUGE which were developed for other domains, without proper justification for their use in this new field. We present cases where traditional metrics are vulnerable to syntactic changes, and show they do not correlate well with human judgments. By addressing these issues, we aim to emphasize the need for a critical reevaluation of how music captions are assessed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Natural Language Processing Techniques · Translation Studies and Practices