TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation

Joydeep Chandra; Satyam Kumar Navneet; Yong Zhang

arXiv:2602.22775·cs.HC·February 27, 2026

TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation

Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang

PDF

Open Access

TL;DR

TherapyProbe is a methodology that uses adversarial multi-agent simulation to identify relational safety failures in mental health chatbots, providing a safety pattern library and design recommendations to improve long-term therapeutic interactions.

Contribution

It introduces a novel, cost-effective simulation-based approach to systematically uncover relational safety failures and develop a comprehensive safety pattern library for mental health chatbots.

Findings

01

Identified 23 relational safety failure archetypes.

02

Surface interaction patterns like validation spirals and empathy fatigue.

03

Provided actionable design recommendations for safer chatbots.

Abstract

As mental health chatbots proliferate to address the global treatment gap, a critical question emerges: How do we design for relational safety the quality of interaction patterns that unfold across conversations rather than the correctness of individual responses? Current safety evaluations assess single-turn crisis responses, missing the therapeutic dynamics that determine whether chatbots help or harm over time. We introduce TherapyProbe, a design probe methodology that generates actionable design knowledge by systematically exploring chatbot conversation trajectories through adversarial multi-agent simulation. Using open-source models, TherapyProbe surfaces relational safety failures interaction patterns like "validation spirals" where chatbots progressively reinforce hopelessness, or "empathy fatigue" where responses become mechanical over turns. Our contribution is translating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education