What Makes a Good Paraphrase: Do Automated Evaluations Work?

Anna Moskvina; Bhushan Kotnis; Chris Catacata; Michael Janz; Nasrin; Saef

arXiv:2307.14818·cs.CL·July 28, 2023

What Makes a Good Paraphrase: Do Automated Evaluations Work?

Anna Moskvina, Bhushan Kotnis, Chris Catacata, Michael Janz, Nasrin, Saef

PDF

Open Access

TL;DR

This paper investigates the criteria for acceptable paraphrases and evaluates the effectiveness of automated metrics versus expert judgment in assessing paraphrase quality, using experiments on a German dataset.

Contribution

It provides an empirical analysis of automated evaluation metrics for paraphrasing and compares them with expert linguistic assessments.

Findings

01

Automated metrics show varying correlation with expert judgments.

02

Certain automated metrics can partially predict paraphrase quality.

03

The study highlights limitations of current automated evaluation methods.

Abstract

Paraphrasing is the task of expressing an essential idea or meaning in different words. But how different should the words be in order to be considered an acceptable paraphrase? And can we exclusively use automated metrics to evaluate the quality of a paraphrase? We attempt to answer these questions by conducting experiments on a German data set and performing automatic and expert linguistic evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems