The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference

Ma\"el Jullien; Marco Valentino; and Andr\'e Freitas

arXiv:2508.10777·cs.AI·August 15, 2025

The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference

Ma\"el Jullien, Marco Valentino, and Andr\'e Freitas

PDF

TL;DR

This paper introduces a benchmark to evaluate LLMs' reasoning in clinical NLP, revealing they often possess relevant knowledge but lack the structured internal representations needed for reliable inference.

Contribution

It presents a novel Clinical Trial Natural Language Inference benchmark with targeted probes to dissociate factual access from inference failures in LLMs.

Findings

01

LLMs perform well on knowledge verification but poorly on reasoning tasks

02

Inferences are consistent but often rely on heuristics and shortcuts

03

Current LLMs lack the structured, composable representations for reliable reasoning

Abstract

Large language models are often assumed to acquire increasingly structured, generalizable internal representations simply by scaling data and parameters. We interrogate this assumption by introducing a Clinical Trial Natural Language Inference benchmark comprising four reasoning families, Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Each item is paired with a targeted Ground Knowledge and Meta-Level Reasoning Verification (GKMRV) probe, allowing us to dissociate failures of factual access from failures of inference. We evaluate six contemporary LLMs under both direct and chain of thought prompting. Models achieve near-ceiling GKMRV accuracy (mean accuracy 0.918) yet perform poorly on the main reasoning tasks (mean accuracy 0.25). Despite low accuracy, output inferences are highly consistent across samples (mean 0.87), indicating a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.