Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies

Mael Jullien; Marco Valentino; Leonardo Ranaldi; Andre Freitas

arXiv:2507.04142·cs.CL·July 8, 2025

Dissecting Clinical Reasoning in Language Models: A Comparative Study of Prompts and Model Adaptation Strategies

Mael Jullien, Marco Valentino, Leonardo Ranaldi, Andre Freitas

PDF

TL;DR

This study systematically evaluates how prompt design and lightweight fine-tuning influence clinical reasoning in language models, showing that prompt structure is a key factor and that small models can perform comparably to larger systems with proper prompts and adaptation.

Contribution

It provides the first controlled analysis of prompt and fine-tuning effects on clinical NLP reasoning, highlighting the importance of prompt structure and lightweight adaptation techniques.

Findings

01

Prompt type accounts for up to 44% of performance variance.

02

LoRA fine-tuning improves F1 scores by 8-12 points.

03

Compact models with prompts and LoRA approach GPT-4o-mini performance.

Abstract

Recent works on large language models (LLMs) have demonstrated the impact of prompting strategies and fine-tuning techniques on their reasoning capabilities. Yet, their effectiveness on clinical natural language inference (NLI) remains underexplored. This study presents the first controlled evaluation of how prompt structure and efficient fine-tuning jointly shape model performance in clinical NLI. We inspect four classes of prompting strategies to elicit reasoning in LLMs at different levels of abstraction, and evaluate their impact on a range of clinically motivated reasoning types. For each prompting strategy, we construct high-quality demonstrations using a frontier model to distil multi-step reasoning capabilities into smaller models (4B parameters) via Low-Rank Adaptation (LoRA). Across different language models fine-tuned on the NLI4CT benchmark, we found that prompt type alone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.