WRDScore: New Metric for Evaluation of Natural Language Generation   Models

Ravil Mussabayev

arXiv:2405.19220·cs.CL·August 14, 2024

WRDScore: New Metric for Evaluation of Natural Language Generation Models

Ravil Mussabayev

PDF

Open Access 1 Repo

TL;DR

This paper introduces WRDScore, a novel evaluation metric for natural language generation that uses optimal transport theory to better capture semantic and syntactic variations, outperforming traditional metrics.

Contribution

We propose WRDScore, a lightweight, normalized, and effective metric based on optimal transport, addressing limitations of existing evaluation methods for language generation.

Findings

01

WRDScore correlates better with human judgments than existing metrics.

02

It balances precision and recall effectively in evaluation.

03

Experiments show WRDScore's superiority over traditional metrics.

Abstract

Evaluating natural language generation models, particularly for method name prediction, poses significant challenges. A robust metric must account for the versatility of method naming, considering both semantic and syntactic variations. Traditional overlap-based metrics, such as ROUGE, fail to capture these nuances. Existing embedding-based metrics often suffer from imbalanced precision and recall, lack normalized scores, or make unrealistic assumptions about sequences. To address these limitations, we leverage the theory of optimal transport and construct WRDScore, a novel metric that strikes a balance between simplicity and effectiveness. In the WRDScore framework, we define precision as the maximum degree to which the predicted sequence's tokens are included in the reference sequence, token by token. Recall is calculated as the total cost of the optimal transport plan that maps the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rmusab/wrd-score
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems