# Applications of Large Language Models in Medical Research: From Systematic Reviews to Clinical Studies

**Authors:** Eun Jeong Gong, Chang Seok Bang, Yong Seok Shin

PMC · DOI: 10.3390/bioengineering13030365 · Bioengineering · 2026-03-20

## TL;DR

This paper reviews how large language models are being used in medical research, from systematic reviews to clinical studies, highlighting their benefits and limitations.

## Contribution

The paper provides a comprehensive synthesis of LLM applications in medical research workflows, emphasizing the need for human oversight.

## Key findings

- LLMs reduce screening workload in systematic reviews by 40% but show limited agreement in risk-of-bias assessments.
- LLMs exhibit high hallucination rates in scientific writing, necessitating rigorous verification.
- LLMs aid clinical research tasks like statistical coding but require human validation to avoid cognitive offloading.

## Abstract

Background: Large Language Models (LLMs) are reshaping medical research workflows. Objective: This narrative review synthesizes evidence on LLM applications across systematic reviews, scientific writing, and clinical research. Methods: We reviewed literature from 2023–2025 examining LLM applications in medical research, identified through PubMed, Scopus, Web of Science, arXiv, medRxiv, and Google Scholar. Studies reporting empirical findings, methodological evaluations, or systematic analyses of LLM applications were included; editorials and commentaries without empirical data were excluded. Results: In systematic reviews, LLMs achieve 80–94% data extraction accuracy and 40% reduction in screening workload, but show only slight-to-moderate agreement (κ = 0.16–0.43) in risk-of-bias assessment. In scientific writing, hallucination rates of 47–55% for fabricated references and over 90% prevalence of demographic bias require rigorous verification. For clinical research, LLMs assist with statistical coding and protocol development but require human validation. Critically, excessive reliance on automated tools may cause cognitive offloading that compromises analytical capabilities. Conclusions: LLMs are powerful but unstable tools requiring constant verification. Success depends on maintaining human-in-the-loop approaches that preserve critical thinking while leveraging AI efficiency.

## Full-text entities

- **Diseases:** hallucination (MESH:D006212)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13024205/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13024205/full.md

## References

127 references — full list in the complete paper: https://tomesphere.com/paper/PMC13024205/full.md

---
Source: https://tomesphere.com/paper/PMC13024205