Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models
Majid Zarharan, Pascal Wullschleger, Babak Behkam Kia, Mohammad Taher, Pilehvar, Jennifer Foster

TL;DR
This paper evaluates large language models' ability to verify and explain public health claims, comparing prompting and fine-tuning methods, and introduces a dual automatic and human evaluation approach.
Contribution
It provides a comprehensive analysis of explainable fact-checking with LLMs, highlighting performance differences across prompting and fine-tuning, and introduces a novel human evaluation framework.
Findings
GPT-4 excels in zero-shot verification and explanation.
Open-source models can match or surpass GPT-4 with fine-tuning.
Human evaluation uncovers issues with gold explanations.
Abstract
This paper presents a comprehensive analysis of explainable fact-checking through a series of experiments, focusing on the ability of large language models to verify public health claims and provide explanations or justifications for their veracity assessments. We examine the effectiveness of zero/few-shot prompting and parameter-efficient fine-tuning across various open and closed-source models, examining their performance in both isolated and joint tasks of veracity prediction and explanation generation. Importantly, we employ a dual evaluation approach comprising previously established automatic metrics and a novel set of criteria through human evaluation. Our automatic evaluation indicates that, within the zero-shot scenario, GPT-4 emerges as the standout performer, but in few-shot and parameter-efficient fine-tuning contexts, open-source models demonstrate their capacity to not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Absolute Position Encodings
