FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data

Deren Lei; Yaxi Li; Siyao Li; Mengya Hu; Rui Xu; Ken Archer; Mingyu; Wang; Emily Ching; Alex Deng

arXiv:2501.17144·cs.CL·January 29, 2025

FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data

Deren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu, Wang, Emily Ching, Alex Deng

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

FactCG introduces a graph-based multi-hop reasoning approach for fact-checking that improves detection of hallucinations in LLM outputs, outperforming larger models on benchmark datasets.

Contribution

The paper proposes CG2C, a novel synthetic data generation method leveraging multi-hop reasoning on context graphs, enhancing fact-checking models like FactCG.

Findings

01

FactCG outperforms GPT-4-o on LLM-Aggrefact benchmark.

02

Multi-hop reasoning improves factuality detection accuracy.

03

Graph-based data enhances model performance with smaller size.

Abstract

Prior research on training grounded factuality classification models to detect hallucinations in large language models (LLMs) has relied on public natural language inference (NLI) data and synthetic data. However, conventional NLI datasets are not well-suited for document-level reasoning, which is critical for detecting LLM hallucinations. Recent approaches to document-level synthetic data generation involve iteratively removing sentences from documents and annotating factuality using LLM-based prompts. While effective, this method is computationally expensive for long documents and limited by the LLM's capabilities. In this work, we analyze the differences between existing synthetic training data used in state-of-the-art models and real LLM output claims. Based on our findings, we propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

derenlei/factcg
pytorchOfficial

Models

🤗
yaxili96/FactCG-DeBERTa-v3-Large
model· 3.8k dl· ♡ 3
3.8k dl♡ 3

Videos

FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data· underline

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Data Mining Algorithms and Applications