Training and Evaluation of Guideline-Based Medical Reasoning in LLMs

Michael Staniek; Artem Sokolov; Stefan Riezler

arXiv:2512.03838·cs.CL·December 4, 2025

Training and Evaluation of Guideline-Based Medical Reasoning in LLMs

Michael Staniek, Artem Sokolov, Stefan Riezler

PDF

Open Access

TL;DR

This paper teaches large language models to follow medical guidelines step-by-step, improving their interpretability and correctness in medical reasoning, especially for complex definitions like Sepsis-3, through fine-tuning on verbalized rule instantiations.

Contribution

It introduces a method for fine-tuning LLMs with verbalized medical rules, enabling faithful and correct medical reasoning and automatic evaluation of inference correctness.

Findings

01

Fine-tuned small models outperform larger prompted models.

02

Nearly perfect rule derivation on unseen data within the same medical area.

03

Multimodal integration improves forecasting of clinical variables.

Abstract

Machine learning for early prediction in medicine has recently shown breakthrough performance, however, the focus on improving prediction accuracy has led to a neglect of faithful explanations that are required to gain the trust of medical practitioners. The goal of this paper is to teach LLMs to follow medical consensus guidelines step-by-step in their reasoning and prediction process. Since consensus guidelines are ubiquitous in medicine, instantiations of verbalized medical inference rules to electronic health records provide data for fine-tuning LLMs to learn consensus rules and possible exceptions thereof for many medical areas. Consensus rules also enable an automatic evaluation of the model's inference process regarding its derivation correctness (evaluating correct and faithful deduction of a conclusion from given premises) and value correctness (comparing predicted values…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling