What Makes a Good Paraphrase: Do Automated Evaluations Work?
Anna Moskvina, Bhushan Kotnis, Chris Catacata, Michael Janz, Nasrin, Saef

TL;DR
This paper investigates the criteria for acceptable paraphrases and evaluates the effectiveness of automated metrics versus expert judgment in assessing paraphrase quality, using experiments on a German dataset.
Contribution
It provides an empirical analysis of automated evaluation metrics for paraphrasing and compares them with expert linguistic assessments.
Findings
Automated metrics show varying correlation with expert judgments.
Certain automated metrics can partially predict paraphrase quality.
The study highlights limitations of current automated evaluation methods.
Abstract
Paraphrasing is the task of expressing an essential idea or meaning in different words. But how different should the words be in order to be considered an acceptable paraphrase? And can we exclusively use automated metrics to evaluate the quality of a paraphrase? We attempt to answer these questions by conducting experiments on a German data set and performing automatic and expert linguistic evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
