PsychBench: Auditing Epidemiological Fidelity in Large Language Model Mental Health Simulations

Patrick Keough

arXiv:2604.17359·cs.CY·April 21, 2026

PsychBench: Auditing Epidemiological Fidelity in Large Language Model Mental Health Simulations

Patrick Keough

PDF

TL;DR

This study introduces PsychBench, an epidemiological audit of large language model (LLM) mental health simulations, revealing models produce plausible individuals but misrepresent population distributions and encode biases.

Contribution

First comprehensive epidemiological evaluation of LLM patient simulations highlighting population-level validity issues and biases in mental health modeling.

Findings

01

Models produce clinically plausible individuals but misrepresent population distributions.

02

Variance compression reduces population diversity, especially in clinical tails.

03

Models overestimate depression severity and encode racialized and gendered biases.

Abstract

Large language models are increasingly deployed to simulate patients for clinical training, research, and mental health tools, yet population-level validity remains largely untested. We introduce PsychBench, the first epidemiological audit of LLM patient simulation: 28,800 profiles from four frontier models (GPT-4o-mini, DeepSeek-V3, Gemini-3-Flash, GLM-4.7) evaluated against NHANES and NESARC-III baselines across 120 intersectional cohorts. The central finding is a coherence-fidelity dissociation: models produce clinically plausible individuals while misrepresenting the populations they are drawn from. Variance compression ranges from 14 percent (GLM-4.7) to 62 percent (DeepSeek-V3), eliminating the distributional tails of clinical reality. Despite test-retest correlations above r = 0.90, 36.66 percent of cases cross diagnostic thresholds between runs. Symptom correlation matrices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.