Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation

Zerui Xu; Fang Wu; Yingzhou Lu; Yuanyuan Zhang; Yue Zhao

arXiv:2410.12476·cs.CL·March 27, 2026

Retrieval-Reasoning Large Language Model-based Synthetic Clinical Trial Generation

Zerui Xu, Fang Wu, Yingzhou Lu, Yuanyuan Zhang, Yue Zhao

PDF

TL;DR

This paper introduces a Retrieval-Reasoning framework using large language models to generate synthetic clinical trial reports, enhancing data availability for clinical research while preserving privacy.

Contribution

It presents a novel retrieval and reasoning-based method for generating realistic synthetic clinical trial data using LLMs, improving outcome prediction models.

Findings

01

Synthetic trials effectively augment real datasets

02

Hybrid fine-tuning improves clinical outcome prediction

03

Synthetic data preserves privacy while supporting research

Abstract

Machine learning (ML) holds great promise for clinical applications but is often hindered by limited access to high-quality data due to privacy concerns, high costs, and long timelines associated with clinical trials. While large language models (LLMs) have demonstrated strong performance in general-purpose generation tasks, their application to synthesizing realistic clinical trials remains underexplored. In this work, we propose a novel Retrieval-Reasoning framework that leverages few-shot prompting with LLMs to generate synthetic clinical trial reports annotated with binary success/failure outcomes. Our approach integrates a retrieval module to ground the generation on relevant trial data and a reasoning module to ensure domain-consistent justifications. Experiments conducted on real clinical trials from the ClinicalTrials.gov database demonstrate that the generated synthetic trials…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.