Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation
Adam Jakobsen, Sushant Gautam, Hugo Lewi Hammer, Susanne Olofsdotter, Miriam S Johanson, P{\aa}l Halvorsen, Vajira Thambawita

TL;DR
This paper introduces a zero-shot, knowledge-guided framework using large language models to generate privacy-preserving synthetic psychiatric data, outperforming traditional models in fidelity and privacy risk when real data sharing is restricted.
Contribution
The study presents a novel knowledge-guided LLM approach for synthetic psychiatric data generation that enhances data fidelity and privacy compared to existing deep learning models.
Findings
Knowledge-guided LLM achieves competitive pairwise structure fidelity.
Clinical retrieval improves univariate and pairwise data fidelity.
Real data-free LLM shows low privacy risk similar to state-of-the-art models.
Abstract
AI systems in healthcare research have shown potential to increase patient throughput and assist clinicians, yet progress is constrained by limited access to real patient data. To address this issue, we present a zero-shot, knowledge-guided framework for psychiatric tabular data in which large language models (LLMs) are steered via Retrieval-Augmented Generation using the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and the International Classification of Diseases (ICD-10). We conducted experiments using different combinations of knowledge bases to generate privacy-preserving synthetic data. The resulting models were benchmarked against two state-of-the-art deep learning models for synthetic tabular data generation, namely CTGAN and TVAE, both of which rely on real data and therefore entail potential privacy risks. Evaluation was performed on six anxiety-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Mental Health via Writing · Digital Mental Health Interventions
