Vulnerability-Amplifying Interaction Loops: a systematic failure mode in AI chatbot mental-health interactions
Veith Weilnhammer, Kevin YC Hou, Lennart Luettgau, Christopher Summerfield, Raymond Dolan, Matthew M Nour

TL;DR
This paper introduces SIM-VAIL, a framework for evaluating AI chatbot safety in mental health contexts, revealing systematic vulnerability-amplifying loops across diverse user profiles and chatbot models.
Contribution
The study presents a scalable, multidimensional safety assessment framework for AI chatbots, identifying a new failure mode called Vulnerability-Amplifying Interaction Loops (VAILs).
Findings
Concerning chatbot behaviors are widespread across models and user profiles.
Risk behaviors tend to accumulate over multiple conversation turns.
Newer models show reduced but still present vulnerabilities.
Abstract
Millions of users turn to consumer AI chatbots to discuss mental health and behavioral concerns. While this presents unprecedented opportunities to deliver population-level support, it also highlights an urgent need for rigorous and scalable safety evaluations. Here we introduce SIM-VAIL, an AI chatbot auditing framework that captures how harmful chatbot responses manifest across a range of mental health contexts. SIM-VAIL pairs a simulated user, harboring a distinct psychiatric vulnerability and conversational intent, with a frontier AI chatbot. It scores conversation turns on 13 clinically relevant risk dimensions, enabling context-dependent, temporally resolved safety assessment. Across 810 conversations, encompassing over 90,000 turn-level ratings and 30 psychiatric user profiles, we found evidence of concerning chatbot behavior across virtually all user phenotypes and most of the 9…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · AI in Service Interactions · Artificial Intelligence in Healthcare and Education
