Identifying Reliable Evaluation Metrics for Scientific Text Revision
L\'eane Jourdan, Florian Boudin, Richard Dufour, Nicolas Hernandez

TL;DR
This paper evaluates various metrics for scientific text revision, highlighting limitations of traditional similarity-based metrics and proposing a hybrid approach combining LLM judgments and domain-specific metrics for better assessment.
Contribution
The study identifies the shortcomings of existing evaluation metrics and introduces a hybrid method that improves the reliability of assessing scientific text revisions.
Findings
LLMs effectively assess instruction-following revisions.
Traditional metrics like ROUGE and BERTScore focus on similarity, not quality.
A hybrid approach enhances evaluation reliability.
Abstract
Evaluating text revision in scientific writing remains a challenge, as traditional metrics such as ROUGE and BERTScore primarily focus on similarity rather than capturing meaningful improvements. In this work, we analyse and identify the limitations of these metrics and explore alternative evaluation methods that better align with human judgments. We first conduct a manual annotation study to assess the quality of different revisions. Then, we investigate reference-free evaluation metrics from related NLP domains. Additionally, we examine LLM-as-a-judge approaches, analysing their ability to assess revisions with and without a gold reference. Our results show that LLMs effectively assess instruction-following but struggle with correctness, while domain-specific metrics provide complementary insights. We find that a hybrid approach combining LLM-as-a-judge evaluation and task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Academic Writing and Publishing
MethodsFocus · ALIGN
