Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety
Caitlin A. Stamatis, Jonah Meyerhoff, Richard Zhang, Olivier Tieleman, Matteo Malgaroli, Thomas D. Hull

TL;DR
This study compares safety test set performance with real-world data for mental health AI, revealing that real-world safety failures are fewer than test set failures, emphasizing the need for ongoing safety evaluation in deployment.
Contribution
It provides the first ecological audit of over 20,000 real conversations, highlighting discrepancies between test set results and real-world safety performance of mental health AI.
Findings
Purpose-built AI significantly reduces harmful content compared to general-purpose LLMs.
Test set failure rates are higher than real-world failure rates.
Real-world safety failures are rare, supporting continuous safety monitoring.
Abstract
Large language models (LLMs) are increasingly used for mental health support, yet existing safety evaluations rely primarily on small, simulation-based test sets that have an unknown relationship to the linguistic distribution of real usage. In this study, we present replications of four published safety test sets targeting suicide risk assessment, harmful content generation, refusal robustness, and adversarial jailbreaks for a leading frontier generic AI model alongside an AI purpose built for mental health support. We then propose and conduct an ecological audit on over 20,000 real-world user conversations with the purpose-built AI designed with layered suicide and non-suicidal self-injury (NSSI) safeguards to compare test set performance to real world performance. While the purpose-built AI was significantly less likely than general-purpose LLMs to produce enabling or harmful content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Artificial Intelligence in Healthcare and Education
