Synthetic Data Generation with LLM for Improved Depression Prediction

Andrea Kang; Jun Yu Chen; Zoe Lee-Youngzie; Shuhao Fu

arXiv:2411.17672·cs.LG·November 27, 2024·5 cites

Synthetic Data Generation with LLM for Improved Depression Prediction

Andrea Kang, Jun Yu Chen, Zoe Lee-Youngzie, Shuhao Fu

PDF

Open Access

TL;DR

This paper presents a novel pipeline using Large Language Models to generate synthetic clinical interview data, improving depression prediction accuracy while addressing data privacy and scarcity issues.

Contribution

The study introduces a chain-of-thought prompting method with LLMs to create synthetic, privacy-preserving data that balances depression severity distribution for better model training.

Findings

01

Synthetic data achieved high fidelity and privacy metrics

02

Balanced depression severity distribution improved prediction performance

03

Method effectively addresses data scarcity and privacy concerns

Abstract

Automatic detection of depression is a rapidly growing field of research at the intersection of psychology and machine learning. However, with its exponential interest comes a growing concern for data privacy and scarcity due to the sensitivity of such a topic. In this paper, we propose a pipeline for Large Language Models (LLMs) to generate synthetic data to improve the performance of depression prediction models. Starting from unstructured, naturalistic text data from recorded transcripts of clinical interviews, we utilize an open-source LLM to generate synthetic data through chain-of-thought prompting. This pipeline involves two key steps: the first step is the generation of the synopsis and sentiment analysis based on the original transcript and depression score, while the second is the generation of the synthetic synopsis/sentiment analysis based on the summaries generated in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification