Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation
Guillermo Iglesias, Gema Bello-Orgaz, Mar\'ia Navas-Loro, Cristian Ramirez-Atencia, Merc\`e Salvador Robert, Enrique Baca-Garcia

TL;DR
This paper presents a multi-dimensional evaluation framework for synthetic mental health reports generated by LLMs, balancing fidelity, diversity, and privacy to enhance clinical NLP data without breaching confidentiality.
Contribution
It introduces a comprehensive evaluation method for synthetic clinical texts and demonstrates that LLMs can generate safe, diverse, and coherent reports for mental health data augmentation.
Findings
Generated reports are clinically coherent and diverse.
Models produce privacy-safe synthetic data.
Synthetic data significantly expands training datasets.
Abstract
The scarcity of high-quality annotated medical data, particularly in mental health, poses a significant bottleneck for training robust machine learning models. Privacy regulations restrict data sharing, making synthetic data generation a promising alternative. The use of Large Language Models (LLMs) in a data augmentation pipeline could be leveraged as an alternative in this field. In the proposed methodology, DeepSeek-R1, OpenBioLLM-Llama3 and Qwen 3.5 are used to generate synthetic mental health evaluation reports conditioned on specific International Classification of Diseases, Tenth Revision (ICD-10) codes. Because naive text generation can lead to mode collapse or privacy breaches (memorization), a comprehensive evaluation framework is introduced. The generated diagnostic texts are assessed across three dimensions: semantic fidelity, lexical diversity, and privacy/plagiarism. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
