A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health
Alexander Marrapese, Basem Suleiman, Imdad Ullah, Juno Kim

TL;DR
This paper introduces a new evaluation framework with quantitative metrics for assessing the nuanced conversation abilities of large language models in mental health, emphasizing safety and effectiveness.
Contribution
It develops a transferable, literature-based framework and metrics for evaluating LLMs in mental health, applied to popular models like GPT and Llama, with a focus on safety-critical responses.
Findings
GPT4 Turbo closely resembles verified therapists in mental health conversations.
Performance varies across mental health topics, with high accuracy in Parenting and Relationships.
The framework can be applied to other domains beyond mental health.
Abstract
Understanding the conversation abilities of Large Language Models (LLMs) can help lead to its more cautious and appropriate deployment. This is especially important for safety-critical domains like mental health, where someone's life may depend on the exact wording of a response to an urgent question. In this paper, we propose a novel framework for evaluating the nuanced conversation abilities of LLMs. Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature. While we ensure that our framework and metrics are transferable by researchers to relevant adjacent domains, we apply them to the mental health field. We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset. Our results show that GPT4 Turbo can perform significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Cosine Annealing · Dropout · Byte Pair Encoding · Dense Connections · Adam
