# Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines

**Authors:** Michael S. Yao, Allison Chae, Piya Saraiya, Charles E. Kahn, Walter R. Witschey, James C. Gee, Hersh Sagreiya, Osbert Bastani

PMC · DOI: 10.1038/s43856-025-01061-9 · Communications Medicine · 2025-08-04

## TL;DR

This study explores using AI to help emergency doctors choose the right diagnostic scans for patients, matching or exceeding human performance.

## Contribution

A framework and dataset (RadCases) are introduced to align language models with medical guidelines for acute imaging decisions.

## Key findings

- Language models achieved accuracy comparable to clinicians in ordering imaging studies.
- The framework can act as an intelligent assistant to support evidence-based imaging decisions.
- AI tools can reduce variability in diagnostic imaging recommendations.

## Abstract

Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings.

In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology’s Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices.

Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology’s Appropriateness Criteria.

Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

Yao et al. evaluate the ability of large language models to order diagnostic images in acute patient care. By aligning model predictions with evidence-based guidelines, they show that language models can match or exceed clinician performance, and assist providers in making evidence-based imaging decisions.

Emergency room doctors often need to quickly decide which medical scans, such as X-rays or CT scans, to order for patients. However, these decisions can vary significantly among doctors. In this study, we looked at whether generative artificial intelligence (AI) can help recommend which scans patients should receive. We created a dataset of real patient cases to help AI tools follow expert medical guidelines when suggesting scans. Our results show that AI tools can accurately choose the right scans for patients and can also be helpful assistants for clinicians. In the future, we hope this work can support faster, more accurate decision-making and reduce unnecessary tests.

## Full-text entities

- **Genes:** FPR1 (formyl peptide receptor 1) [NCBI Gene 2357] {aka FMLP, FPR}
- **Diseases:** AI (MESH:C538142), Torso Trauma (MESH:D014947), LLMs (MESH:D007806), Major Blunt Trauma (MESH:D014949), alcohol withdrawal syndrome (MESH:D020270), Breast Pain (MESH:D059373), cardiac and gastrointestinal conditions (MESH:D005767), MIMIC-IV (MESH:D015819), dermatologic condition (MESH:D000168), IDDM (MESH:D003922), Chronic cough (MESH:D003371), Pulmonary Embolism (MESH:D011655), ACR (MESH:D006478), ICL (MESH:D007859)
- **Chemicals:** ACR (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Lama glama (llama, species) [taxon 9844]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12322208/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12322208/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12322208/full.md

---
Source: https://tomesphere.com/paper/PMC12322208