MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

Weixin Liu; Congning Ni; Shelagh A. Mulvaney; Susannah L. Rose; Murat Kantarcioglu; Bradley A. Malin; and Zhijun Yin

arXiv:2605.15589·cs.CL·May 18, 2026

MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models

Weixin Liu, Congning Ni, Shelagh A. Mulvaney, Susannah L. Rose, Murat Kantarcioglu, Bradley A. Malin, and Zhijun Yin

PDF

TL;DR

This paper introduces MHGraphBench, a knowledge-graph grounded benchmark for evaluating large language models' understanding of mental health knowledge, revealing significant gaps in reasoning and response reliability.

Contribution

It presents a novel KG-based benchmark derived from PrimeKG to assess LLMs on mental health tasks, highlighting challenges in relation prediction and reasoning capabilities.

Findings

01

Leading models perform well on entity recognition but struggle with relation prediction.

02

Short KG snippets can both help and hinder model performance.

03

Output format reliability significantly affects evaluation results.

Abstract

Large language models (LLMs) are increasingly used in the mental health domain, yet it remains unclear how well they capture related biomedical knowledge and how reliably they apply it to clinically salient structured judgments. Here, we present a knowledge-graph (KG)-grounded benchmark for assessing LLMs on mental-health entity recognition, relation judgment, and two-hop reasoning. The benchmark is derived from PrimeKG and comprises nine task families with KG-supported answers and controlled negative options. Experiments across 15 closed- and open-source LLMs reveal a persistent recognition-to-judgment gap: leading models achieve near-ceiling performance on entity typing and on the small relation-typing subset, yet they still struggle with relation prediction and two-hop reasoning. Additionally, short KG-derived snippets benefit some models but degrade performance for others. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.