An Extensive Evaluation of Factual Consistency in Large Language Models   for Data-to-Text Generation

Joy Mahapatra; Utpal Garain

arXiv:2411.19203·cs.CL·December 2, 2024

An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation

Joy Mahapatra, Utpal Garain

PDF

Open Access

TL;DR

This paper provides a comprehensive evaluation of factual consistency in large language models for data-to-text generation, analyzing multiple datasets, models, and metrics to identify factors influencing factual accuracy.

Contribution

It offers the first extensive evaluation of LLM factual consistency in DTG, comparing multiple models, datasets, and metrics, including human assessments.

Findings

01

Llama 2 often outperforms other models in factual consistency.

02

Larger models generally have higher factual consistency.

03

Source-reference divergence reduces factual consistency.

Abstract

Large Language Models (LLMs) have shown exceptional performance across various Data-to-Text Generation (DTG) tasks. However, generating factually consistent text in DTG remains challenging for LLMs. Despite this, in-depth evaluations of LLM factual consistency for DTG remain missing in the current literature. This paper addresses this gap by providing an extensive evaluation of factual consistency in LLMs for DTG. Our evaluation covers five widely used DTG datasets (E2E, ViGGo, WikiTableText, DART, and WebNLG) and five prominent LLM families (T5, BART, OPT, BLOOM, and Llama 2). To ensure a thorough evaluation of factual consistency, we use four state-of-the-art automatic metrics and include essential human assessments. Our extensive evaluations reveals three key findings regarding factual consistency in LLMs for DTG. First, Llama 2 often excels in generating factually consistent text,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Software Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · Multi-Head Attention · Byte Pair Encoding · Gated Linear Unit · Residual Connection · Softmax