A Novel Nuanced Conversation Evaluation Framework for Large Language   Models in Mental Health

Alexander Marrapese; Basem Suleiman; Imdad Ullah; Juno Kim

arXiv:2403.09705·cs.CL·March 18, 2024·2 cites

A Novel Nuanced Conversation Evaluation Framework for Large Language Models in Mental Health

Alexander Marrapese, Basem Suleiman, Imdad Ullah, Juno Kim

PDF

Open Access

TL;DR

This paper introduces a new evaluation framework with quantitative metrics for assessing the nuanced conversation abilities of large language models in mental health, emphasizing safety and effectiveness.

Contribution

It develops a transferable, literature-based framework and metrics for evaluating LLMs in mental health, applied to popular models like GPT and Llama, with a focus on safety-critical responses.

Findings

01

GPT4 Turbo closely resembles verified therapists in mental health conversations.

02

Performance varies across mental health topics, with high accuracy in Parenting and Relationships.

03

The framework can be applied to other domains beyond mental health.

Abstract

Understanding the conversation abilities of Large Language Models (LLMs) can help lead to its more cautious and appropriate deployment. This is especially important for safety-critical domains like mental health, where someone's life may depend on the exact wording of a response to an urgent question. In this paper, we propose a novel framework for evaluating the nuanced conversation abilities of LLMs. Within it, we develop a series of quantitative metrics developed from literature on using psychotherapy conversation analysis literature. While we ensure that our framework and metrics are transferable by researchers to relevant adjacent domains, we apply them to the mental health field. We use our framework to evaluate several popular frontier LLMs, including some GPT and Llama models, through a verified mental health dataset. Our results show that GPT4 Turbo can perform significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Cosine Annealing · Dropout · Byte Pair Encoding · Dense Connections · Adam