# Insufficient reporting quality in large language model studies in the field of radiology

**Authors:** Pae Sun Suh, So Yeong Jeong, Daiju Ueda, Woo Hyun Shim, Hwon Heo, Chang-Yun Woo, Hyungjun Park, Chong Hyun Suh

PMC · DOI: 10.1186/s13244-026-02236-1 · Insights into Imaging · 2026-03-16

## TL;DR

Many studies on large language models in radiology fail to report essential details, making it hard to reproduce or evaluate their results.

## Contribution

This study identifies key reporting gaps in LLM research within radiology and emphasizes the need for standardized reporting guidelines.

## Key findings

- Only 27.6% of studies specified the LLM model version used.
- Most studies lacked details on output probability and API usage.
- Reporting deficiencies remained consistent before and after July 2024.

## Abstract

Our systematic review aimed to evaluate the quality of reporting in research articles involving LLMs in the radiology field.

After searching the PubMed-MEDLINE and EMBASE databases, a total of 246 eligible studies published between November 30, 2022, and December 31, 2024, were included. The analysis assessed the percentage of studies adhering to key elements required for LLM research, based on the MInimum reporting items for CLear Evaluation of Accuracy Reports of Large Language Models in healthcare (MI-CLEAR-LLM) and the Transparent Reporting of a Multivariable Model for Individual Prognosis Or Diagnosis-large language models (TRIPOD-LLM) checklists. Studies published before and after July 25, 2024, were compared using a chi-square test.

The most common topic was performance evaluation of LLMs using radiologic cases (44.3%, 109/246), followed by radiology reporting (37.8%, 93/246). Although all studies reported LLM’s name, only 27.6% (68/246) specified the model version, 35.8% (88/246) mentioned access date, and 25.2% (62/246) mentioned application programming interface usage. Full prompts were provided in 41.1% (101/246) of studies. Output probability-related issues, including the number of attempts (22.8%, 56/246) and factors such as temperature (16.7%, 41/246), were under-reported. These reporting insufficiencies persisted in studies published before and after July 25, 2024.

Most studies assessing large language models in radiology lacked sufficient reporting of key elements required for large language model research. We recommend that authors strive to adhere to these elements to ensure transparency and improve the reproducibility of future studies.

Our study highlighted the need for improved reporting quality and adherence to key elements to ensure transparent reporting and improve the reproducibility of future studies using large language models.

Numerous studies on large language models (LLMs) in radiology lack standardized methodologies, leading to high variability and inconsistent reporting.Our review demonstrated insufficiency in key elements for LLM research, particularly in model details and output probability.Better reporting and adherence to key elements are essential for enhancing transparency and reproducibility in future LLM research.

Numerous studies on large language models (LLMs) in radiology lack standardized methodologies, leading to high variability and inconsistent reporting.

Our review demonstrated insufficiency in key elements for LLM research, particularly in model details and output probability.

Better reporting and adherence to key elements are essential for enhancing transparency and reproducibility in future LLM research.

## Full-text entities

- **Diseases:** LLMs (MESH:D007806), CLEAR (MESH:D018227), burnout (MESH:D002055), computed tomography (MESH:C000719218)
- **Chemicals:** LLM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12992711/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12992711/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC12992711/full.md

---
Source: https://tomesphere.com/paper/PMC12992711