Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations

Mohit Chandra; Siddharth Sriraman; Harneet Singh Khanuja; Yiqiao Jin; Munmun De Choudhury

arXiv:2505.20201·cs.CL·May 29, 2025

Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations

Mohit Chandra, Siddharth Sriraman, Harneet Singh Khanuja, Yiqiao Jin, Munmun De Choudhury

PDF

Open Access

TL;DR

This paper introduces MedAgent and MultiSenseEval frameworks to generate and evaluate multi-turn mental health conversations with LLMs, revealing current models' limitations in patient-centric communication and diagnostic capabilities.

Contribution

The paper presents a novel synthetic data generation framework, a new dataset, and a holistic evaluation method for assessing LLMs in multi-turn mental health dialogues.

Findings

01

Frontier reasoning models perform below expectations in patient-centric communication.

02

Models struggle with advanced diagnostic capabilities, scoring around 31%.

03

Performance varies based on patient's persona and decreases with more conversation turns.

Abstract

Limited access to mental healthcare, extended wait times, and increasing capabilities of Large Language Models (LLMs) has led individuals to turn to LLMs for fulfilling their mental health needs. However, examining the multi-turn mental health conversation capabilities of LLMs remains under-explored. Existing evaluation frameworks typically focus on diagnostic accuracy and win-rates and often overlook alignment with patient-specific goals, values, and personalities required for meaningful conversations. To address this, we introduce MedAgent, a novel framework for synthetically generating realistic, multi-turn mental health sensemaking conversations and use it to create the Mental Health Sensemaking Dialogue (MHSD) dataset, comprising over 2,200 patient-LLM conversations. Additionally, we present MultiSenseEval, a holistic framework to evaluate the multi-turn conversation abilities of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Mental Health via Writing · Topic Modeling

MethodsFocus