RankME: Reliable Human Ratings for Natural Language Generation

Jekaterina Novikova; Ond\v{r}ej Du\v{s}ek; Verena Rieser

arXiv:1803.05928·cs.CL·October 3, 2018

RankME: Reliable Human Ratings for Natural Language Generation

Jekaterina Novikova, Ond\v{r}ej Du\v{s}ek, Verena Rieser

PDF

1 Repo

TL;DR

This paper introduces RankME, a novel evaluation method for natural language generation that enhances the reliability and consistency of human ratings through a rank-based magnitude estimation approach, enabling multi-criteria assessment and cost-effective system ranking.

Contribution

The paper proposes RankME, a new evaluation technique combining continuous scales and relative assessments to improve human judgment quality in NLG evaluation.

Findings

01

RankME significantly improves rating reliability and consistency.

02

It allows evaluation of multiple criteria for NLG systems.

03

RankME combined with Bayesian estimation is cost-effective for system ranking.

Abstract

Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeknov/RankME
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.