# Large Language Models for Clinical Trial Protocol Assessments

**Authors:** Euibeom Shin, Amruta Gajanan Bhat, Murali Ramanathan

PMC · DOI: 10.1002/cpt.70096 · Clinical Pharmacology and Therapeutics · 2025-10-21

## TL;DR

This study shows that large language models like ChatGPT can help assess clinical trial protocols, especially for statistical and pharmacological components.

## Contribution

The study introduces the use of LLMs for evaluating SAP and PK–PD components in clinical trial protocols.

## Key findings

- ChatGPT accurately identified trial details like disease and sample size for most trials.
- LLMs demonstrated satisfactory ability to extract and summarize technical protocol details.
- Some limitations in contextual accuracy were observed in LLM outputs.

## Abstract

The purpose was to evaluate the utility of large language models (LLMs) for reviewing the statistical analysis plan (SAP) and pharmacokinetics–pharmacodynamics (PK–PD) components of clinical trial protocols. Clinical trial protocols and SAPs were obtained from clinicaltrials.gov for a testbed of 15 small‐molecule drugs, biologics, and global antibiotic and public health interventions. The GPT‐4o (ChatGPT) LLM was used to elicit study design attributes, relevant guidelines, and detailed SAP evaluations with prompts engineered to the persona of a regulatory expert. The SAP methodology was assessed against the Food and Drug Administration’s (FDA) E9 Statistical Principles for Clinical Trials guidance. The SAP evaluation outputs were assessed in post hoc analyses with ChatGPT and Grok, based on a rubric that evaluated the accuracy of primary outcome identification, the correctness of statistical methodology, compliance with the FDA E9 guidance, and clinical interpretability. PK–PD analysis plans were assessed on the accuracy of PK–PD objectives and measures and PK analysis methods. ChatGPT accurately identified the disease, intervention, and comparator groups for all trials, as well as the study sample size for 14 out of 15 trials. The most frequently cited guidelines were the FDA’s E9 guidance for SAP and the FDA Guidance for Industry: Population Pharmacokinetics for PK–PD. ChatGPT outputs of the SAP and PK–PD analysis plans were clear and organized, demonstrating a satisfactory ability to extract and summarize technical details; some limitations in contextual accuracy were observed. LLMs can be effective tools for assessing the SAP, PK–PD, and other aspects of clinical trial protocol reviews.

## Full-text entities

- **Genes:** SH2D1A (SH2 domain containing 1A) [NCBI Gene 4068] {aka DSHP, EBVS, IMD5, LYP, MTCP1, SAP}, ITIH4 (inter-alpha-trypsin inhibitor heavy chain 4) [NCBI Gene 3700] {aka GP120, H4P, IHRP, ITI-HC4, ITIHL1, PK-120}, SKAP2 (src kinase associated phosphoprotein 2) [NCBI Gene 8935] {aka PRAP, RA70, SAPS, SCAP2, SKAP-HOM, SKAP55R}, CD274 (CD274 molecule) [NCBI Gene 29126] {aka ADMIO5, B7-H, B7H1, PD-L1, PDCD1L1, PDCD1LG1}
- **Diseases:** CHANGE (MESH:D009402), PD (MESH:D010300), PK (MESH:C564858), LLMs (MESH:D007806), ARTIFICIAL INTELLIGENCE (MESH:C538142), Multiple Sclerosis (MESH:D009103)
- **Chemicals:** maribavir (MESH:C400401), FDA E9 (-), MF59 (MESH:C089950)
- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12816432/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12816432/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12816432/full.md

---
Source: https://tomesphere.com/paper/PMC12816432