MentalBench: A DSM-Grounded Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
Hoyun Song, Migyeong Kang, Jisu Shin, Jihyun Kim, Chanbi Park, Hangyeol Yoo, Jihyun An, Alice Oh, Jinyoung Han, KyungTae Lim

TL;DR
MentalBench is a new benchmark that evaluates large language models' ability to make DSM-5 grounded psychiatric diagnoses using a knowledge graph and synthetic clinical cases, highlighting current model limitations.
Contribution
The paper introduces MentalBench, a DSM-grounded benchmark with a validated knowledge graph and synthetic cases, to assess LLMs' psychiatric diagnostic capabilities.
Findings
LLMs perform well on noise-free DSM knowledge queries.
Models struggle with confidence calibration in complex, overlapping symptom cases.
Current LLMs may not be reliable for psychiatric decision support.
Abstract
Large language models (LLMs) have attracted growing interest as supportive tools for psychiatric assessment and clinical decision support. However, existing mental health benchmarks largely rely on social media data or supportive dialogue settings, limiting their ability to assess whether models can apply formal diagnostic criteria and differential diagnostic rules. In this paper, we introduce MentalBench, a benchmark for evaluating whether LLMs can make DSM-grounded psychiatric diagnostic decisions under varying levels of clinical ambiguity. At the core of MentalBench is MentalKG, a psychiatrist-built and validated knowledge graph encoding DSM-5 diagnostic criteria and differential diagnostic rules for 23 psychiatric disorders. Using MentalKG as an expert-curated logical backbone, we generate 24,750 synthetic clinical cases that systematically vary in information completeness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Machine Learning in Healthcare · Digital Mental Health Interventions
