SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering
Arshia Ilaty, Hossein Shirazi, Hajar Homayouni

TL;DR
SynLLM demonstrates that prompt engineering with large language models can generate high-quality, clinically valid, and privacy-preserving synthetic medical tabular data, facilitating safer healthcare data sharing.
Contribution
This work introduces SynLLM, a modular framework using structured prompts and comprehensive evaluation for generating realistic synthetic medical data with open-source LLMs.
Findings
Prompt engineering greatly influences data quality and privacy.
Rule-based prompts achieve optimal privacy-quality trade-off.
LLMs can produce clinically plausible synthetic data with proper guidance.
Abstract
Access to real-world medical data is often restricted due to privacy regulations, posing a significant barrier to the advancement of healthcare research. Synthetic data offers a promising alternative; however, generating realistic, clinically valid, and privacy-conscious records remains a major challenge. Recent advancements in Large Language Models (LLMs) offer new opportunities for structured data generation; however, existing approaches frequently lack systematic prompting strategies and comprehensive, multi-dimensional evaluation frameworks. In this paper, we present SynLLM, a modular framework for generating high-quality synthetic medical tabular data using 20 state-of-the-art open-source LLMs, including LLaMA, Mistral, and GPT variants, guided by structured prompts. We propose four distinct prompt types, ranging from example-driven to rule-based constraints, that encode schema,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
