An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation
Joy Mahapatra, Utpal Garain

TL;DR
This paper provides a comprehensive evaluation of factual consistency in large language models for data-to-text generation, analyzing multiple datasets, models, and metrics to identify factors influencing factual accuracy.
Contribution
It offers the first extensive evaluation of LLM factual consistency in DTG, comparing multiple models, datasets, and metrics, including human assessments.
Findings
Llama 2 often outperforms other models in factual consistency.
Larger models generally have higher factual consistency.
Source-reference divergence reduces factual consistency.
Abstract
Large Language Models (LLMs) have shown exceptional performance across various Data-to-Text Generation (DTG) tasks. However, generating factually consistent text in DTG remains challenging for LLMs. Despite this, in-depth evaluations of LLM factual consistency for DTG remain missing in the current literature. This paper addresses this gap by providing an extensive evaluation of factual consistency in LLMs for DTG. Our evaluation covers five widely used DTG datasets (E2E, ViGGo, WikiTableText, DART, and WebNLG) and five prominent LLM families (T5, BART, OPT, BLOOM, and Llama 2). To ensure a thorough evaluation of factual consistency, we use four state-of-the-art automatic metrics and include essential human assessments. Our extensive evaluations reveals three key findings regarding factual consistency in LLMs for DTG. First, Llama 2 often excels in generating factually consistent text,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · Multi-Head Attention · Byte Pair Encoding · Gated Linear Unit · Residual Connection · Softmax
