# Enhancing Health Fact-Checking with LLM-Generated Synthetic Data

**Authors:** Jingze Zhang, Jiahe Qian, Yiliang Zhou, Yifan Peng

arXiv: 2508.20525 · 2025-08-29

## TL;DR

This paper introduces a synthetic data generation pipeline using large language models to improve health-related fact-checking accuracy, demonstrating significant performance gains on public datasets.

## Contribution

The study presents a novel LLM-based synthetic data augmentation method specifically designed for health fact-checking, enhancing model performance with limited annotated data.

## Key findings

- F1 scores increased by up to 0.019 on PubHealth
- F1 scores increased by up to 0.049 on SciFact
- Synthetic data improves health fact-checking models

## Abstract

Fact-checking for health-related content is challenging due to the limited availability of annotated training data. In this study, we propose a synthetic data generation pipeline that leverages large language models (LLMs) to augment training data for health-related fact checking. In this pipeline, we summarize source documents, decompose the summaries into atomic facts, and use an LLM to construct sentence-fact entailment tables. From the entailment relations in the table, we further generate synthetic text-claim pairs with binary veracity labels. These synthetic data are then combined with the original data to fine-tune a BERT-based fact-checking model. Evaluation on two public datasets, PubHealth and SciFact, shows that our pipeline improved F1 scores by up to 0.019 and 0.049, respectively, compared to models trained only on the original data. These results highlight the effectiveness of LLM-driven synthetic data augmentation in enhancing the performance of health-related fact-checkers.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20525/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20525/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/2508.20525/full.md

---
Source: https://tomesphere.com/paper/2508.20525