Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification

John Dougrez-Lewis; Mahmud Elahi Akhter; Federico Ruggeri; Sebastian L\"obbers; Yulan He; Maria Liakata

arXiv:2402.10735·cs.CL·June 18, 2025·3 cites

Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification

John Dougrez-Lewis, Mahmud Elahi Akhter, Federico Ruggeri, Sebastian L\"obbers, Yulan He, Maria Liakata

PDF

Open Access 3 Videos

TL;DR

This paper introduces RECV, a novel benchmark to evaluate LLMs' reasoning in claim verification, revealing strengths in deductive reasoning but challenges in abductive reasoning and the nuanced effects of rationale generation.

Contribution

The paper presents a new framework for breaking down claims into reasoning types and creates the first benchmark for assessing LLMs' reasoning in evidence-based claim verification.

Findings

01

LLMs perform well on deductive reasoning tasks.

02

LLMs struggle with abductive reasoning.

03

Rationale generation does not always improve LLM performance.

Abstract

Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of increasing complexity. We evaluate three state-of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning problems, they consistently fail in cases of abductive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification· underline

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Sparse Evolutionary Training · Label Smoothing · Graph Self-Attention · Linear Layer · RAdam · Absolute Position Encodings · Attention Dropout · Byte Pair Encoding