TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

Zixin Xiong; Ziteng Wang; Haotian Fan; Xinjie Zhang; Wenxuan Wang

arXiv:2603.03047·cs.CL·March 4, 2026

TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

Zixin Xiong, Ziteng Wang, Haotian Fan, Xinjie Zhang, Wenxuan Wang

PDF

Open Access

TL;DR

This paper introduces TrustMH-Bench, a comprehensive framework for evaluating the trustworthiness of large language models in mental health, addressing domain-specific concerns often overlooked by general evaluation methods.

Contribution

We propose TrustMH-Bench, a holistic benchmark that systematically assesses mental health LLMs across eight trustworthiness dimensions, filling a critical evaluation gap.

Findings

01

Models underperform in trustworthiness metrics in mental health scenarios.

02

Even advanced models like GPT-5.1 show significant deficiencies.

03

Systematic improvements are necessary for deploying trustworthy mental health LLMs.

Abstract

While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs fail to capture mental health-specific requirements, highlighting an urgent need to prioritize and enhance their trustworthiness. To address this, we propose TrustMH-Bench, a holistic framework designed to systematically quantify the trustworthiness of mental health LLMs. By establishing a deep mapping from domain-specific norms to quantitative evaluation metrics, TrustMH-Bench evaluates models across eight core pillars: Reliability, Crisis Identification and Escalation, Safety, Fairness, Privacy, Robustness, Anti-sycophancy, and Ethics. We conduct extensive experiments across six…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Artificial Intelligence in Healthcare and Education