Check-Eval: A Checklist-based Approach for Evaluating Text Quality

Jayr Pereira; Andre Assumpcao; Roberto Lotufo

arXiv:2407.14467·cs.CL·September 11, 2024·1 cites

Check-Eval: A Checklist-based Approach for Evaluating Text Quality

Jayr Pereira, Andre Assumpcao, Roberto Lotufo

PDF

Open Access

TL;DR

Check-Eval introduces a checklist-based framework leveraging large language models to evaluate generated text quality, achieving higher correlation with human judgments than existing metrics across benchmark datasets.

Contribution

It presents a novel, structured evaluation method combining checklist generation and assessment, improving alignment with human evaluations for natural language generation.

Findings

01

Outperforms existing metrics like G-Eval and GPTScore in correlation with human judgments.

02

Works effectively as both reference-free and reference-dependent evaluation.

03

Validated on Portuguese Legal Semantic Textual Similarity and SummEval datasets.

Abstract

Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose \textsc{Check-Eval}, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. \textsc{Check-Eval} can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate \textsc{Check-Eval} on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and \textsc{SummEval}. Our results demonstrate that \textsc{Check-Eval} achieves higher correlations with human judgments compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPharmacy and Medical Practices

MethodsALIGN