PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments
Zhuang Chen, Dazhen Wan, Zhangkai Zheng, Guanqun Bi, Xiyao Xiao, Binghang Li, Minlie Huang

TL;DR
PsychePass introduces a novel framework for evaluating and improving LLM therapeutic skills by using trajectory-anchored tournaments, addressing instability issues in current assessment methods.
Contribution
It proposes a unified approach combining simulation anchoring and tournament-based ranking to reliably evaluate and enhance LLM therapeutic competence.
Findings
Effective calibration of LLMs' therapeutic skills.
Strong alignment with human expert judgments.
Enables reinforcement learning for performance improvement.
Abstract
While large language models show promise in mental healthcare, evaluating their therapeutic competence remains challenging due to the unstructured and longitudinal nature of counseling. We argue that current evaluation paradigms suffer from an unanchored defect, leading to two forms of instability: process drift, where unsteered client simulation wanders away from specific counseling goals, and standard drift, where static pointwise scoring lacks the stability for reliable judgment. To address this, we introduce Ps, a unified framework that calibrates the therapeutic competence of LLMs via trajectory-anchored tournaments. We first anchor the interaction trajectory in simulation, where clients precisely control the fluid consultation process to probe multifaceted capabilities. We then anchor the battle trajectory in judgments through an efficient Swiss-system tournament, utilizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · Machine Learning in Healthcare
