PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large   Language Models

Haoan Jin; Siyuan Chen; Dilawaier Dilixiati; Yewei Jiang; Mengyue Wu,; Kenny Q. Zhu

arXiv:2311.09189·cs.CL·June 4, 2024·5 cites

PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models

Haoan Jin, Siyuan Chen, Dilawaier Dilixiati, Yewei Jiang, Mengyue Wu,, Kenny Q. Zhu

PDF

Open Access 1 Repo

TL;DR

PsyEval introduces a comprehensive set of mental health-related tasks to evaluate large language models, revealing current limitations and guiding future improvements in this sensitive domain.

Contribution

This paper presents the first specialized evaluation suite, PsyEval, for assessing LLMs on mental health tasks, addressing a critical gap in model evaluation.

Findings

01

Significant performance gaps in current LLMs on mental health tasks

02

PsyEval covers five sub-tasks across three mental health dimensions

03

Results highlight the need for targeted model improvements in mental health understanding

Abstract

Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critical dimensions of mental health. This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain. We evaluate twelve advanced LLMs using PsyEval. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaguraruri/psy-eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Topic Modeling