# Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety

**Authors:** Caitlin Stamatis, Jonah Meyerhoff, Richard Zhang, Olivier Tieleman, Matteo Malgaroli, Thomas Hull

PMC · DOI: 10.21203/rs.3.rs-8642399/v1 · Research Square · 2026-01-27

## TL;DR

A mental health AI system was found safer than general AI models in handling sensitive topics, based on real user conversations and safety benchmarks.

## Contribution

The study introduces an ecological audit using 20,000 real user conversations to evaluate mental health AI safety, revealing lower harmful content rates.

## Key findings

- The purpose-built mental health AI produced significantly less harmful content than general-purpose LLMs across suicide, NSSI, eating disorder, and substance use benchmarks.
- In real user data, no suicide-risk cases lacked crisis resources, and only 0.015% of NSSI mentions lacked intervention.
- Ecological audits are shown to be effective for estimating AI safety in real-world mental health applications.

## Abstract

Large language models (LLMs) are increasingly used for mental health, yet safety evaluations rely primarily on small, simulation-based benchmarks removed from real-world language. We replicate four published safety evaluations assessing suicide risk handling, harmful content generation, and jailbreak resistance for general-purpose frontier models and a purpose-built mental health AI. We then conduct an ecological audit of 20,000 real user conversations with the purpose-built system, which includes layered safeguards for suicide and non-suicidal self-injury (NSSI). The purpose-built AI was significantly less likely than general-purpose LLMs to produce harmful content across suicide/NSSI (.4–11.27% vs 29.0–54.4%), eating disorder (8.4% vs 54.0%), and substance use (9.9% vs 45.0%) benchmarks. In real user data, clinician review found zero suicide-risk cases without crisis resources. Three NSSI mentions (.015%) lacked intervention, implying a .38% lower-bound false negative rate. Findings support the utility of ecological audits for safety estimation.

## Linked entities

- **Diseases:** eating disorder (MONDO:0005451)

## Full-text entities

- **Diseases:** -suicidal self-injury (MESH:D012652), eating disorder (MESH:D001068)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869570/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12869570/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869570/full.md

---
Source: https://tomesphere.com/paper/PMC12869570