Understanding Metrics for Paraphrasing
Omkar Patil, Rahul Singh, Tarun Joshi

TL;DR
This paper introduces a new metric, ROUGE_P, for evaluating paraphrases by capturing adequacy, novelty, and fluency, addressing limitations of existing metrics and improving paraphrase quality assessment.
Contribution
The paper proposes ROUGE_P, a novel paraphrase evaluation metric, and provides empirical evidence that existing metrics are inadequate for measuring paraphrase quality.
Findings
Current metrics fail to fully capture paraphrase quality.
ROUGE_P effectively measures adequacy, novelty, and fluency.
Empirical results show improved evaluation of paraphrasing models.
Abstract
Paraphrase generation is a difficult problem. This is not only because of the limitations in text generation capabilities but also due that to the lack of a proper definition of what qualifies as a paraphrase and corresponding metrics to measure how good it is. Metrics for evaluation of paraphrasing quality is an on going research problem. Most of the existing metrics in use having been borrowed from other tasks do not capture the complete essence of a good paraphrase, and often fail at borderline-cases. In this work, we propose a novel metric to measure the quality of paraphrases along the dimensions of adequacy, novelty and fluency. We also provide empirical evidence to show that the current natural language generation metrics are insufficient to measure these desired properties of a good paraphrase. We look at paraphrase model fine-tuning and generation from the lens of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
