Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data
Shinka Mori, Oana Ignat, Andrew Lee, Rada Mihalcea

TL;DR
This paper evaluates the ability of GPT-3 to generate synthetic mental health data across demographics, analyzing its representation of stressors and comparing it to human data to assess fidelity and limitations.
Contribution
It introduces HEADROOM, a new synthetic dataset of depression stressors across demographics, and provides methods to analyze and compare synthetic and human-generated mental health data.
Findings
Synthetic data partially replicates human data distribution of stressors.
GPT-3 can generate demographic-specific depression stressors.
Analysis reveals limitations in LLMs for fully capturing demographic nuances.
Abstract
Synthetic data generation has the potential to impact applications and domains with scarce data. However, before such data is used for sensitive tasks such as mental health, we need an understanding of how different demographics are represented in it. In our paper, we analyze the potential of producing synthetic data using GPT-3 by exploring the various stressors it attributes to different race and gender combinations, to provide insight for future researchers looking into using LLMs for data generation. Using GPT-3, we develop HEADROOM, a synthetic dataset of 3,120 posts about depression-triggering stressors, by controlling for race, gender, and time frame (before and after COVID-19). Using this dataset, we conduct semantic and lexical analyses to (1) identify the predominant stressors for each demographic group; and (2) compare our synthetic data to a human-generated dataset. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · Health disparities and outcomes
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Cosine Annealing · Multi-Head Attention · Linear Warmup With Cosine Annealing · Softmax
