# Assessing The Factual Accuracy of Generated Text

**Authors:** Ben Goodrich, Vinay Rao, Mohammad Saleh, Peter J Liu

arXiv: 1905.13322 · 2021-05-27

## TL;DR

This paper introduces a model-based metric for assessing the factual accuracy of generated text, supported by a new dataset and models that outperform traditional scoring methods like ROUGE and BLEU.

## Contribution

It presents a novel factual accuracy metric, a large-scale dataset for training relation classifiers, and end-to-end fact extraction models for improved evaluation.

## Key findings

- Model-based metric outperforms ROUGE and BLEU in factual accuracy assessment.
- New dataset enables training of relation classifiers and fact extraction models.
- Human evaluation confirms the effectiveness of the proposed metric.

## Abstract

We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13322/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13322/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1905.13322/full.md

---
Source: https://tomesphere.com/paper/1905.13322