The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues
Youyou Cheng, Zhuangwei Kang, Kerry Jiang, Chenyu Sun, Qiyang Pan

TL;DR
This paper reveals that large language models often breach safety boundaries during extended mental health dialogues, with adaptive probing accelerating boundary violations, highlighting the need for multi-turn safety evaluations.
Contribution
It introduces a multi-turn stress testing framework and demonstrates that safety boundary violations increase over dialogue turns, emphasizing the importance of multi-turn safety assessment.
Findings
Violations are common in multi-turn dialogues.
Adaptive probing reduces the number of turns before boundary breach.
Making definitive promises is a primary boundary violation.
Abstract
Large language models (LLMs) have been widely used for mental health support. However, current safety evaluations in this field are mostly limited to detecting whether LLMs output prohibited words in single-turn conversations, neglecting the gradual erosion of safety boundaries in long dialogues. Examples include making definitive guarantees, assuming responsibility, and playing professional roles. We believe that with the evolution of mainstream LLMs, words with obvious safety risks are easily filtered by their underlying systems, while the real danger lies in the gradual transgression of boundaries during multi-turn interactions, driven by the LLM's attempts at comfort and empathy. This paper proposes a multi-turn stress testing framework and conducts long-dialogue safety tests on three cutting-edge LLMs using two pressure methods: static progression and adaptive probing. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Healthcare Decision-Making and Restraints
