An AI-Based Behavioral Health Safety Filter and Dataset for Identifying Mental Health Crises in Text-Based Conversations

Benjamin W. Nelson; Celeste Wong; Matthew T. Silvestrini; Sooyoon Shin; Alanna Robinson; Jessica Lee; Eric Yang; John Torous; Andrew Trister

arXiv:2510.12083·cs.CL·October 15, 2025·2 cites

An AI-Based Behavioral Health Safety Filter and Dataset for Identifying Mental Health Crises in Text-Based Conversations

Benjamin W. Nelson, Celeste Wong, Matthew T. Silvestrini, Sooyoon Shin, Alanna Robinson, Jessica Lee, Eric Yang, John Torous, Andrew Trister

PDF

Open Access

TL;DR

This study introduces a new AI-based safety filter for mental health crisis detection in text conversations, demonstrating superior performance over existing content moderation tools across multiple datasets.

Contribution

The paper presents the Verily Behavioral Health Safety Filter (VBHSF), a novel AI tool with high sensitivity and specificity for identifying mental health crises, validated on clinician-labelled datasets.

Findings

01

VBHSF achieved high sensitivity (0.990) and specificity (0.992) on the Verily dataset.

02

It outperformed existing guardrails like NVIDIA NeMo and OpenAI Omni Moderation in sensitivity.

03

VBHSF demonstrated robust, generalizable performance across different datasets.

Abstract

Large language models often mishandle psychiatric emergencies, offering harmful or inappropriate advice and enabling destructive behaviors. This study evaluated the Verily behavioral health safety filter (VBHSF) on two datasets: the Verily Mental Health Crisis Dataset containing 1,800 simulated messages and the NVIDIA Aegis AI Content Safety Dataset subsetted to 794 mental health-related messages. The two datasets were clinician-labelled and we evaluated performance using the clinician labels. Additionally, we carried out comparative performance analyses against two open source, content moderation guardrails: OpenAI Omni Moderation Latest and NVIDIA NeMo Guardrails. The VBHSF demonstrated, well-balanced performance on the Verily Mental Health Crisis Dataset v1.0, achieving high sensitivity (0.990) and specificity (0.992) in detecting any mental health crises. It achieved an F1-score of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing