SynSym: A Synthetic Data Generation Framework for Psychiatric Symptom Identification
Migyeong Kang, Jihyun Kim, Hyolim Jeon, Sunwoo Hwang, Jihyun An, Yonghoon Kim, Haewoon Kwak, Jisun An, Jinyoung Han

TL;DR
SynSym is a framework that uses large language models to generate synthetic psychiatric symptom data, improving model training for mental health analysis from social media posts without extensive expert labeling.
Contribution
It introduces a novel synthetic data generation method leveraging LLMs to create diverse, realistic symptom expressions, enhancing psychiatric symptom identification models.
Findings
Models trained on SynSym data perform comparably to real data-trained models.
Synthetic data improves model generalizability across different symptom expression styles.
Fine-tuning with real data further enhances model performance.
Abstract
Psychiatric symptom identification on social media aims to infer fine-grained mental health symptoms from user-generated posts, allowing a detailed understanding of users' mental states. However, the construction of large-scale symptom-level datasets remains challenging due to the resource-intensive nature of expert labeling and the lack of standardized annotation guidelines, which in turn limits the generalizability of models to identify diverse symptom expressions from user-generated text. To address these issues, we propose SynSym, a synthetic data generation framework for constructing generalizable datasets for symptom identification. Leveraging large language models (LLMs), SynSym constructs high-quality training samples by (1) expanding each symptom into sub-concepts to enhance the diversity of generated expressions, (2) producing synthetic expressions that reflect psychiatric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Machine Learning in Healthcare
