PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

Yaling Shen; Stephanie Fong; Yiwen Jiang; Zimu Wang; Feilong Tang; Qingyang Xu; Xiangyu Zhao; Zhongxing Xu; Jiahe Liu; Jinpeng Hu; Dominic Dwyer; Zongyuan Ge

arXiv:2601.03578·cs.CL·January 8, 2026

PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics

Yaling Shen, Stephanie Fong, Yiwen Jiang, Zimu Wang, Feilong Tang, Qingyang Xu, Xiangyu Zhao, Zhongxing Xu, Jiahe Liu, Jinpeng Hu, Dominic Dwyer, Zongyuan Ge

PDF

Open Access

TL;DR

This paper introduces PsychEthicsBench, a new benchmark based on Australian mental health ethics, to evaluate large language models' ethical responses beyond simple refusal metrics, revealing insights into their safety and clinical appropriateness.

Contribution

It presents the first principle-grounded benchmark for assessing LLMs' ethical knowledge in mental health, moving beyond refusal-based safety signals and highlighting the impact of domain-specific fine-tuning.

Findings

01

Refusal rates are poor indicators of ethical behavior.

02

Domain-specific fine-tuning can reduce ethical robustness.

03

Specialized models may underperform base models in ethical alignment.

Abstract

The increasing integration of large language models (LLMs) into mental health applications necessitates robust frameworks for evaluating professional safety alignment. Current evaluative approaches primarily rely on refusal-based safety signals, which offer limited insight into the nuanced behaviors required in clinical practice. In mental health, clinically inadequate refusals can be perceived as unempathetic and discourage help-seeking. To address this gap, we move beyond refusal-centric metrics and introduce \texttt{PsychEthicsBench}, the first principle-grounded benchmark based on Australian psychology and psychiatry guidelines, designed to evaluate LLMs' ethical knowledge and behavioral responses through multiple-choice and open-ended tasks with fine-grained ethicality annotations. Empirical results across 14 models reveal that refusal rates are poor indicators of ethical behavior,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Topic Modeling · Artificial Intelligence in Healthcare and Education