# Large language models and conditional rules in clinical decision support systems

**Authors:** Shangeetha Sivasothy, Adrian Bingham, Irini Logothetis, Scott Barnett, Mohamed Abdelrazek, Carl Luckhoff, Joseph Mathew, Rajesh Vasa, Kon Mouzakis

PMC · DOI: 10.1007/s13755-026-00428-z · 2026-01-21

## TL;DR

This paper explores how large language and reasoning models can help create clinical decision rules, reducing the need for repeated clinician-developer collaboration.

## Contribution

The study evaluates LLMs and LRMs for generating triaging rules in CDSS, comparing their accuracy, interpretability, and complexity to a clinical rule set.

## Key findings

- LLMs generated less interpretable and complex rules compared to the PiMS clinical rule set when PiMS variables were included in prompts.
- LLMs and LRMs showed varying accuracy, with LRMs achieving up to 81.70% accuracy but still falling short of clinical standards.
- Using LLMs and LRMs can reduce the time needed for rule refinement by providing a feasible initial rule set.

## Abstract

Clinical Decision Support Systems (CDSS) improve patient outcomes and support sustainable health services by enhancing medical decisions. Developing rules for a CDSS is expensive due to delays in capturing and defining the rules through multiple iterations between clinicians and developers as the role of a clinician is patient care.

We investigate the effectiveness of large language models (LLMs) and large reasoning models (LRMs) in generating a triaging rule set for a CDSS.

We prompt various LLMs (GPT-3.5, GPT-4, GPT-4o, Gemini, Claude 3.5 Sonnet) and various LRMs (GPT-o1-mini, Grok-4, Claude 4 Sonnet) using alternative prompting techniques. We compare the LLM generated rule sets against the clinical rule set from our Pandemic Intervention Monitoring System (PiMS); a triaging CDSS built in collaboration with clinicians to monitor COVID-19 positive patients. Effectiveness is evaluated based on the accuracy, interpretability, and rule complexity.

We identified that LLMs generated COVID-19 screening rule sets compared to triaging rule sets when not specifying the variables from our PiMS rule set. By including PiMS variables in our prompts, we discovered LLMs 1) had lower interpretability and rule complexity compared to the PiMS rule set, and 2) resulted in an average accuracy between 31.62% ± 0.19% and 70.71% ± 0.02%. While for LRMs, we identified that 1) interpretability varied between 3 and 94 compared to 41 identified in our PiMS rule set and 2) resulted in an average accuracy between 31.62% ± 0.19% and 81.70 ± 0.05%.

LLMs are limited in emulating clinical rule sets due to their simplicity and lack of complex reasoning. Despite LRMs improving effectiveness, they are still limited. LLMs and LRMs can generate a feasible initial rule set for CDSS. This can reduce time invested by clinicians and developers by minimising the number of iterations for refinement. Future work can explore integrating LLMs and LRMs with decision trees to improve effectiveness.

The online version contains supplementary material available at 10.1007/s13755-026-00428-z.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), LRMs (MESH:D004195), LLMs (MESH:D007806)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12824032/full.md

---
Source: https://tomesphere.com/paper/PMC12824032