Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification
John Dougrez-Lewis, Mahmud Elahi Akhter, Federico Ruggeri, Sebastian L\"obbers, Yulan He, Maria Liakata

TL;DR
This paper introduces RECV, a novel benchmark to evaluate LLMs' reasoning in claim verification, revealing strengths in deductive reasoning but challenges in abductive reasoning and the nuanced effects of rationale generation.
Contribution
The paper presents a new framework for breaking down claims into reasoning types and creates the first benchmark for assessing LLMs' reasoning in evidence-based claim verification.
Findings
LLMs perform well on deductive reasoning tasks.
LLMs struggle with abductive reasoning.
Rationale generation does not always improve LLM performance.
Abstract
Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of increasing complexity. We evaluate three state-of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning problems, they consistently fail in cases of abductive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Sparse Evolutionary Training · Label Smoothing · Graph Self-Attention · Linear Layer · RAdam · Absolute Position Encodings · Attention Dropout · Byte Pair Encoding
