Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

Yunna Cai; Fan Wang; Haowei Wang; Kun Wang; Kailai Yang; Sophia Ananiadou; Moyan Li; Mingming Fan

arXiv:2508.08236·cs.CL·February 16, 2026

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge

Yunna Cai, Fan Wang, Haowei Wang, Kun Wang, Kailai Yang, Sophia Ananiadou, Moyan Li, Mingming Fan

PDF

Open Access

TL;DR

This paper introduces PsyCrisis-Bench, a reference-free evaluation benchmark for assessing the safety alignment of Chinese mental health dialogue models using an LLM-as-Judge approach grounded in psychological principles.

Contribution

It proposes a novel prompt-based LLM-as-Judge method for safety evaluation without gold standards, along with a high-quality Chinese dataset for mental health dialogues.

Findings

01

Achieves high agreement with expert assessments

02

Provides interpretable safety evaluation rationales

03

Outperforms existing evaluation approaches

Abstract

Evaluating the safety alignment of LLM responses in high-risk mental health dialogues is particularly difficult due to missing gold-standard answers and the ethically sensitive nature of these interactions. To address this challenge, we propose PsyCrisis-Bench, a reference-free evaluation benchmark based on real-world Chinese mental health dialogues. It evaluates whether the model responses align with the safety principles defined by experts. Specifically designed for settings without standard references, our method adopts a prompt-based LLM-as-Judge approach that conducts in-context evaluation using expert-defined reasoning chains grounded in psychological intervention principles. We employ binary point-wise scoring across multiple safety dimensions to enhance the explainability and traceability of the evaluation. Additionally, we present a manually curated, high-quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Topic Modeling