Can LLMs extract human-like fine-grained evidence for evidence-based fact-checking?

Anton\'in Jarol\'im; Martin Faj\v{c}\'ik; Lucia Makaiov\'a

arXiv:2511.21401·cs.CL·November 27, 2025

Can LLMs extract human-like fine-grained evidence for evidence-based fact-checking?

Anton\'in Jarol\'im, Martin Faj\v{c}\'ik, Lucia Makaiov\'a

PDF

Open Access

TL;DR

This study evaluates large language models' ability to extract precise, human-like evidence from texts to support fact-checking claims in Czech and Slovak, revealing varying success levels and highlighting the importance of model size and alignment.

Contribution

The paper introduces a new dataset for fine-grained evidence extraction in Czech and Slovak and assesses LLMs' performance, revealing their limitations and strengths in evidence copying and alignment.

Findings

01

Llama3.1:8b achieves high accuracy despite small size

02

GPT-oss-120b underperforms relative to size

03

Qwen3:14b, DeepSeek-R1:32b, GPT-oss:20b show good size-performance balance

Abstract

Misinformation frequently spreads in user comments under online news articles, highlighting the need for effective methods to detect factually incorrect information. To strongly support or refute claims extracted from such comments, it is necessary to identify relevant documents and pinpoint the exact text spans that justify or contradict each claim. This paper focuses on the latter task -- fine-grained evidence extraction for Czech and Slovak claims. We create new dataset, containing two-way annotated fine-grained evidence created by paid annotators. We evaluate large language models (LLMs) on this dataset to assess their alignment with human annotations. The results reveal that LLMs often fail to copy evidence verbatim from the source text, leading to invalid outputs. Error-rate analysis shows that the {llama3.1:8b model achieves a high proportion of correct outputs despite its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Computational and Text Analysis Methods