PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators

Nguyen Khoi Hoang; Shuhaib Mehri; Tse-An Hsu; Yi-Jyun Sun; Quynh Xuan Nguyen Truong; Khoa D Doan; Dilek Hakkani-T\"ur

arXiv:2604.25840·cs.CL·April 29, 2026

PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators

Nguyen Khoi Hoang, Shuhaib Mehri, Tse-An Hsu, Yi-Jyun Sun, Quynh Xuan Nguyen Truong, Khoa D Doan, Dilek Hakkani-T\"ur

PDF

TL;DR

This paper introduces PSI-Bench, an interpretable evaluation framework for depression patient simulators, revealing their limitations and guiding future improvements in mental health training tools.

Contribution

The paper presents PSI-Bench, a novel, clinically grounded evaluation framework that assesses depression simulators across multiple behavioral dimensions, addressing prior evaluation shortcomings.

Findings

01

Simulators tend to produce overly long, lexically diverse responses.

02

Simulators show reduced behavioral variability and quick emotion resolution.

03

The simulation framework impacts fidelity more than model size.

Abstract

Patient simulators are gaining traction in mental health training by providing scalable exposure to complex and sensitive patient interactions. Simulating depressed patients is particularly challenging, as safety constraints and high patient variability complicate simulations and underscore the need for simulators that capture diverse and realistic patient behaviors. However, existing evaluations heavily rely on LLM-judges with poorly specified prompts and do not assess behavioral diversity. We introduce PSI-Bench, an automatic evaluation framework that provides interpretable, clinically grounded diagnostics of depression patient simulator behavior across turn-, dialogue-, and population-level dimensions. Using PSI-Bench, we benchmark seven LLMs across two simulator frameworks and find that simulators produce overly long, lexically diverse responses, show reduced variability, resolve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.