PRMB: Benchmarking Reward Models in Long-Horizon CBT-based Counseling Dialogue

Yougen Zhou; Qin Chen; Ningning Zhou; Jie Zhou; Liang He

arXiv:2603.11494·cs.DB·March 13, 2026

PRMB: Benchmarking Reward Models in Long-Horizon CBT-based Counseling Dialogue

Yougen Zhou, Qin Chen, Ningning Zhou, Jie Zhou, Liang He

PDF

Open Access

TL;DR

This paper introduces PRMB, a comprehensive benchmark for evaluating reward models in multi-session CBT counseling dialogues, revealing their limitations and potential for improving mental health applications.

Contribution

It presents the first long-horizon, process-oriented benchmark for reward models in mental health dialogues, enabling better assessment and understanding of model performance.

Findings

01

Positive correlation between benchmark scores and counseling performance

02

Reward models exhibit generalization defects not seen in previous benchmarks

03

Generative reward models show significant potential for mental health dialogue applications

Abstract

Large language models (LLMs) hold potential for mental healthcare applications, particularly in cognitive behavioral therapy (CBT)-based counseling, where reward models play a critical role in aligning LLMs with preferred therapeutic behaviors. However, existing reward model evaluations often fail to capture alignment effectiveness in long-horizon interventions due to limited coverage of process-oriented datasets and misalignment between evaluation targets and psychological alignment objectives. To address these limitations, we present PRMB, a comprehensive benchmark tailored for evaluating reward models in multi-session CBT counseling. PRMB spans 6 sessions and 21 diverse negative scenarios, incorporating both pairwise and Best-of-N preference evaluations. We demonstrate a positive correlation between our benchmark and downstream counseling dialogue performance. Based on our benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Digital Mental Health Interventions · Topic Modeling