Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Ian Steenstra; Paola Pedrelli; Weiyan Shi; Stacy Marsella; Timothy W. Bickmore

arXiv:2602.19948·cs.CL·March 6, 2026

Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

Ian Steenstra, Paola Pedrelli, Weiyan Shi, Stacy Marsella, Timothy W. Bickmore

PDF

Open Access 1 Datasets

TL;DR

This paper presents a comprehensive evaluation framework for assessing safety risks of large language models in mental health support, highlighting critical safety gaps and the importance of simulation-based red teaming before deployment.

Contribution

It introduces a novel framework combining AI psychotherapists with simulated patient agents to identify safety risks in AI mental health support systems.

Findings

01

Identified safety gaps such as AI psychosis and failure to de-escalate suicide risk.

02

Validated the framework with 369 simulated therapy sessions across diverse clinical phenotypes.

03

Demonstrated stakeholder utility through an interactive safety assessment dashboard.

Abstract

Large Language Models (LLMs) are increasingly utilized for mental health support; however, current safety benchmarks often fail to detect the complex, longitudinal risks inherent in therapeutic dialogue. We introduce an evaluation framework that pairs AI psychotherapists with simulated patient agents equipped with dynamic cognitive-affective models and assesses therapy session simulations against a comprehensive quality of care and risk ontology. We apply this framework to a high-impact test case, Alcohol Use Disorder, evaluating six AI agents (including ChatGPT, Gemini, and Character AI) against a clinically-validated cohort of 15 patient personas representing diverse clinical phenotypes. Our large-scale simulation (N=369 sessions) reveals critical safety gaps in the use of AI for mental health support. We identify specific iatrogenic risks, including the validation of patient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

IanSteenstra/AI-Psychotherapy-Eval
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education