A Comparative Study of Quality Evaluation Methods for Text Summarization

Huyen Nguyen; Haihua Chen; Lavanya Pobbathi; Junhua Ding

arXiv:2407.00747·cs.CL·July 2, 2024·6 cites

A Comparative Study of Quality Evaluation Methods for Text Summarization

Huyen Nguyen, Haihua Chen, Lavanya Pobbathi, Junhua Ding

PDF

Open Access

TL;DR

This paper introduces a new LLM-based method for evaluating text summarization, demonstrating it aligns more closely with human judgment than traditional automatic metrics across patent datasets.

Contribution

The paper presents a novel LLM-based evaluation approach and provides a comprehensive comparison with existing metrics and human assessments.

Findings

01

LLM evaluation aligns closely with human judgment

02

Traditional metrics like ROUGE-2 and BERTScore lack consistency

03

Proposed framework improves automatic evaluation of summarization

Abstract

Evaluating text summarization has been a challenging task in natural language processing (NLP). Automatic metrics which heavily rely on reference summaries are not suitable in many situations, while human evaluation is time-consuming and labor-intensive. To bridge this gap, this paper proposes a novel method based on large language models (LLMs) for evaluating text summarization. We also conducts a comparative study on eight automatic metrics, human evaluation, and our proposed LLM-based method. Seven different types of state-of-the-art (SOTA) summarization models were evaluated. We perform extensive experiments and analysis on datasets with patent documents. Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency. Based on the empirical comparison, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Data Quality and Management

MethodsSoftmax · Attention Is All You Need