Medical Reasoning with Large Language Models: A Survey and MR-Bench

Xiaohan Ren; Chenxiao Fan; Wenyin Ma; Hongliang He; Chongming Gao; Xiaoyan Zhao; Fuli Feng

arXiv:2604.08559·cs.CL·April 13, 2026

Medical Reasoning with Large Language Models: A Survey and MR-Bench

Xiaohan Ren, Chenxiao Fan, Wenyin Ma, Hongliang He, Chongming Gao, Xiaoyan Zhao, Fuli Feng

PDF

TL;DR

This paper reviews medical reasoning with large language models, introduces MR-Bench for real-world clinical evaluation, and highlights gaps between current performance and clinical needs.

Contribution

It provides a comprehensive survey of medical reasoning methods, organizes them into seven routes, and introduces MR-Bench for authentic clinical assessment.

Findings

01

Models perform well on exam-style tasks but poorly on real clinical data.

02

MR-Bench reveals a significant gap between exam performance and clinical accuracy.

03

Unified evaluation exposes the need for improved reasoning in clinical settings.

Abstract

Large language models (LLMs) have achieved strong performance on medical exam-style tasks, motivating growing interest in their deployment in real-world clinical settings. However, clinical decision-making is inherently safety-critical, context-dependent, and conducted under evolving evidence. In such situations, reliable LLM performance depends not on factual recall alone, but on robust medical reasoning. In this work, we present a comprehensive review of medical reasoning with LLMs. Grounded in cognitive theories of clinical reasoning, we conceptualize medical reasoning as an iterative process of abduction, deduction, and induction, and organize existing methods into seven major technical routes spanning training-based and training-free approaches. We further conduct a unified cross-benchmark evaluation of representative medical reasoning models under a consistent experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.