Generating High Quality Synthetic Data for Dutch Medical Conversations

Cecilia Kuan; Aditya Kamlesh Parikh; Henk van den Heuvel

arXiv:2604.09645·cs.CL·April 14, 2026

Generating High Quality Synthetic Data for Dutch Medical Conversations

Cecilia Kuan, Aditya Kamlesh Parikh, Henk van den Heuvel

PDF

TL;DR

This paper presents a pipeline for creating synthetic Dutch medical dialogues using a fine-tuned Large Language Model, aiming to enhance clinical NLP resources while addressing privacy concerns.

Contribution

The study introduces a novel method for generating synthetic Dutch medical conversations, evaluated through both quantitative metrics and expert qualitative review.

Findings

01

Synthetic dialogues show high lexical variety but scripted turn-taking.

02

Qualitative review indicates issues with domain specificity and naturalness.

03

Quantitative metrics alone do not fully capture linguistic quality.

Abstract

Medical conversations offer insights into clinical communication often absent from Electronic Health Records. However, developing reliable clinical Natural Language Processing (NLP) models is hampered by the scarcity of domain-specific datasets, as clinical data are typically inaccessible due to privacy and ethical constraints. To address these challenges, we present a pipeline for generating synthetic Dutch medical dialogues using a Dutch fine-tuned Large Language Model, with real medical conversations serving as linguistic and structural reference. The generated dialogues were evaluated through quantitative metrics and qualitative review by native speakers and medical practitioners. Quantitative analysis revealed strong lexical variety and overly regular turn-taking, suggesting scripted rather than natural conversation flow. Qualitative review produced slightly below-average scores,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.