From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence

Premtim Sahitaj; Jawan Kolanowski; Ariana Sahitaj; Veronika Solopova; Max Upravitelev; Daniel R\"oder; Iffat Maab; Junichi Yamagishi; Sebastian M\"oller; Vera Schmitt

arXiv:2605.06006·cs.CL·May 8, 2026

From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence

Premtim Sahitaj, Jawan Kolanowski, Ariana Sahitaj, Veronika Solopova, Max Upravitelev, Daniel R\"oder, Iffat Maab, Junichi Yamagishi, Sebastian M\"oller, Vera Schmitt

PDF

TL;DR

PrimeFacts introduces a methodology and resource for extracting structured, stand-alone evidence from fact-checking articles, significantly enhancing automated claim verification and evidence retrieval.

Contribution

It presents a large-scale dataset and a framework leveraging large language models to extract and rewrite evidence, improving retrieval and verification performance.

Findings

01

Decontextualized premises improve evidence retrievability by up to 30% in MRR.

02

Using extracted premises increases claim verification Macro-F1 by 10-20 points.

03

The approach maintains faithfulness to original sources in qualitative analysis.

Abstract

Fact-checking articles encode rich supporting evidence and reasoning, yet this evidence remains largely inaccessible to automated verification systems due to unstructured presentation. We introduce PrimeFacts, a methodology and resource for extracting fine-grained evidence from full fact-checking articles. We compile 13,106 PolitiFact articles with claims, verdicts, and all referenced sources, and we identify 49,718 in-article hyperlinks as natural anchors to pinpoint key evidence. Our framework leverages large language models (LLMs) to rewrite these anchor sentences into stand-alone, context-independent premises and investigates the extraction of additional implicit evidence. In evaluations on cross-article evidence retrieval and claim verification, the extracted premises substantially improve performance. Decontextualized evidence yields higher retrievability, achieving up to a 30…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.