Armor: A Benchmark for Meta-evaluation of Artificial Music

Songhe Wang; Zheng Bao; Jingtong E

arXiv:2108.12973·cs.SD·August 31, 2021

Armor: A Benchmark for Meta-evaluation of Artificial Music

Songhe Wang, Zheng Bao, Jingtong E

PDF

TL;DR

This paper introduces Armor, a comprehensive benchmark dataset designed to evaluate the effectiveness of objective evaluation methods in artificial music, aiming to bridge the gap with subjective human judgment.

Contribution

Armor is the first rigorous, cross-domain benchmark dataset for meta-evaluating objective music evaluation methods against human judgment.

Findings

01

Significant gap between objective and subjective evaluations.

02

Armor provides a standardized framework for future research.

03

Objective methods still lag behind human judgment in music quality assessment.

Abstract

Objective evaluation (OE) is essential to artificial music, but it's often very hard to determine the quality of OEs. Hitherto, subjective evaluation (SE) remains reliable and prevailing but suffers inevitable disadvantages that OEs may overcome. Therefore, a meta-evaluation system is necessary for designers to test the effectiveness of OEs. In this paper, we present Armor, a complex and cross-domain benchmark dataset that serves for this purpose. Since OEs should correlate with human judgment, we provide music as test cases for OEs and human judgment scores as touchstones. We also provide two meta-evaluation scenarios and their corresponding testing methods to assess the effectiveness of OEs. To the best of our knowledge, Armor is the first comprehensive and rigorous framework that future works could follow, take example by, and improve upon for the task of evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.