ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Zheheng Luo, Qianqian Xie, Sophia Ananiadou

TL;DR
This paper investigates ChatGPT's ability to evaluate factual consistency in text summarization, demonstrating its potential to outperform existing metrics but also revealing some limitations in understanding and reasoning.
Contribution
It introduces the use of ChatGPT for factual inconsistency evaluation in summarization, exploring its effectiveness across multiple evaluation tasks in a zero-shot setting.
Findings
ChatGPT outperforms previous metrics in factual inconsistency evaluation
It shows strong performance in binary entailment, ranking, and rating tasks
Limitations include lexical bias, false reasoning, and instruction understanding issues
Abstract
The performance of text summarization has been greatly boosted by pre-trained language models. A main concern of existing methods is that most generated summaries are not factually inconsistent with their source documents. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference, question answering, and syntactic dependency et al. However, these approaches are limited by either their high computational complexity or the uncertainty introduced by multi-component pipelines, resulting in only partial agreement with human judgement. Most recently, large language models(LLMs) have shown excellent performance in not only text generation but also language comprehension. In this paper, we particularly explore ChatGPT's ability to evaluate factual inconsistency under a zero-shot setting by examining it on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
