Reasoning LLMs in the Medical Domain: A Literature Survey
Armin Berger, Sarthak Khanna, David Berghaus, Rafet Sifa

TL;DR
This survey reviews the evolution of medical Large Language Models, highlighting their reasoning capabilities, technological foundations, evaluation methods, and challenges to guide future development in clinical applications.
Contribution
It provides a comprehensive analysis of medical LLMs, focusing on reasoning techniques, evaluation strategies, and challenges, establishing a roadmap for reliable clinical deployment.
Findings
Advanced reasoning enhances decision transparency in medical LLMs
Prompting techniques like Chain-of-Thought improve clinical reasoning
Evaluation methodologies face challenges in validation and bias mitigation
Abstract
The emergence of advanced reasoning capabilities in Large Language Models (LLMs) marks a transformative development in healthcare applications. Beyond merely expanding functional capabilities, these reasoning mechanisms enhance decision transparency and explainability-critical requirements in medical contexts. This survey examines the transformation of medical LLMs from basic information retrieval tools to sophisticated clinical reasoning systems capable of supporting complex healthcare decisions. We provide a thorough analysis of the enabling technological foundations, with a particular focus on specialized prompting techniques like Chain-of-Thought and recent breakthroughs in Reinforcement Learning exemplified by DeepSeek-R1. Our investigation evaluates purpose-built medical frameworks while also examining emerging paradigms such as multi-agent collaborative systems and innovative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
