Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation

Abdullah Mazhar; Het Riteshkumar Shah; Aseem Srivastava; Smriti Joshi; Md Shad Akhtar

arXiv:2604.05795·cs.CL·April 15, 2026

Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation

Abdullah Mazhar, Het Riteshkumar Shah, Aseem Srivastava, Smriti Joshi, Md Shad Akhtar

PDF

TL;DR

This paper introduces a structured evaluation framework, CARE, for assessing AI-generated therapeutic responses against core psychotherapeutic principles, improving over baseline models and emphasizing clinical fidelity.

Contribution

The paper presents FAITH-M, a new benchmark with expert ratings, and CARE, a multi-stage reasoning framework that enhances evaluation of therapeutic principles in AI mental health responses.

Findings

01

CARE achieves an F-1 score of 63.34, outperforming the baseline Qwen3 by 64.26.

02

Structured reasoning and contextual modeling significantly improve evaluation accuracy.

03

The framework demonstrates robustness under domain shift and highlights challenges in modeling clinical nuances.

Abstract

The increasing use of large language models in mental health applications calls for principled evaluation frameworks that assess alignment with psychotherapeutic best practices beyond surface-level fluency. While recent systems exhibit conversational competence, they lack structured mechanisms to evaluate adherence to core therapeutic principles. In this paper, we study the problem of evaluating AI-generated therapist-like responses for clinically grounded appropriateness and effectiveness. We assess each therapists utterance along six therapeutic principles: non-judgmental acceptance, warmth, respect for autonomy, active listening, reflective understanding, and situational appropriateness using a fine-grained ordinal scale. We introduce FAITH-M, a benchmark annotated with expert-assigned ordinal ratings, and propose CARE, a multi-stage evaluation framework that integrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.