Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation
N. E. Kriman

TL;DR
This paper introduces a method to evaluate the factual accuracy of LLM-generated summaries by leveraging atomic facts entailment metrics, addressing hallucination issues in retrieval augmented generation.
Contribution
It proposes a Naive Bayes-based approach to measure factuality of summaries, providing a new metric for assessing LLM output accuracy.
Findings
Effective in detecting factual inaccuracies in summaries
Improves reliability of LLM-generated content
Addresses hallucination problem in retrieval augmented generation
Abstract
The use of large language models (LLMs) has significantly increased since the introduction of ChatGPT in 2022, demonstrating their value across various applications. However, a major challenge for enterprise and commercial adoption of LLMs is their tendency to generate inaccurate information, a phenomenon known as "hallucination." This project proposes a method for estimating the factuality of a summary generated by LLMs when compared to a source text. Our approach utilizes Naive Bayes classification to assess the accuracy of the content produced.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
