MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models

Suhyun Lee; Palakorn Achananuparp; Neemesh Yadav; Ee-Peng Lim; Yang Deng

arXiv:2604.17730·cs.CL·April 21, 2026

MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models

Suhyun Lee, Palakorn Achananuparp, Neemesh Yadav, Ee-Peng Lim, Yang Deng

PDF

1 Datasets

TL;DR

This paper introduces a novel, role-aware, multi-turn evaluation framework for assessing mental health safety in large language models, revealing role-dependent safety failures missed by static benchmarks.

Contribution

It presents R-MHSafe, a new safety taxonomy, and MHSafeEval, an agent-based evaluation method for dynamic, interaction-level safety assessment of LLMs in mental health contexts.

Findings

01

Significant role-dependent safety failures identified.

02

Cumulative safety issues emerge over multi-turn interactions.

03

Framework improves detection of failure modes compared to static benchmarks.

Abstract

Large language models (LLMs) are increasingly explored as scalable tools for mental health counseling, yet evaluating their safety remains challenging due to the interactional and context-dependent nature of clinical harm. Existing evaluation frameworks predominantly assess isolated responses using coarse-grained taxonomies or static datasets, limiting their ability to diagnose how harms emerge and accumulate over multi-turn counseling interactions. In this work, we introduce R-MHSafe, a role-aware mental health safety taxonomy that characterizes clinically significant harm in terms of the interactional roles an AI counselor adopts, including perpetrator, instigator, facilitator, or enabler, combined with clinically grounded harm categories. Then, we propose MHSafeEval, a closed-loop, agent-based evaluation framework that formulates safety assessment as trajectory-level discovery of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Suhyunlee/MHSafeEval
dataset· 31 dl
31 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.