CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

Bichen Wang; Yixin Sun; Junzhe Wang; Hao Yang; Xing Fu; Yanyan Zhao; Si Wei; Shijin Wang; Bing Qin

arXiv:2511.09407·cs.CL·November 13, 2025

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling

Bichen Wang, Yixin Sun, Junzhe Wang, Hao Yang, Xing Fu, Yanyan Zhao, Si Wei, Shijin Wang, Bing Qin

PDF

Open Access 1 Video

TL;DR

CARE-Bench is a new benchmark for evaluating LLMs in psychological counseling, using diverse simulated clients and multidimensional metrics to better assess model performance and guide future improvements.

Contribution

It introduces CARE-Bench, a dynamic, expert-guided client simulation benchmark with multidimensional evaluation for assessing counseling capabilities of LLMs.

Findings

01

Current LLMs show limitations in handling diverse clients.

02

CARE-Bench reveals specific weaknesses in existing models.

03

Analysis guides development of more effective counseling models.

Abstract

The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static question-and-answer evaluation formats, and unidimensional metrics. These limitations hinder their effectiveness in assessing a model's comprehensive ability to handle diverse and complex clients. To address this gap, we introduce \textbf{CARE-Bench}, a dynamic and interactive automated benchmark. It is built upon diverse client profiles derived from real-world counseling cases and simulated according to expert guidelines. CARE-Bench provides a multidimensional performance evaluation grounded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling· underline

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Topic Modeling