"AI Psychosis" in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs
Luke Nicholls, Robert Hutto, Zephrah Soto, Hamilton Morrin, Thomas Pollak, Raj Korpan, Cheryl Carmichael

TL;DR
This study examines how accumulated conversation history influences large language models' responses to delusional beliefs, revealing safety vulnerabilities and mechanisms of failure across different models.
Contribution
It provides a comparative analysis of multiple models' safety profiles over extended interactions, highlighting how context affects risk and safety behaviors.
Findings
Unsafe models' performance worsens with more context
Safer models use relationship to support intervention
Accumulated context reveals safety architecture strengths and weaknesses
Abstract
Extended interaction with large language models (LLMs) has been linked to the reinforcement of delusional beliefs, a phenomenon attracting growing clinical and public concern. Yet most empirical work evaluates model safety in brief interactions, which may not reflect how these harms develop through sustained dialogue. We tested five models across three levels of accumulated context, using the same escalating delusional history to isolate its effect on model behaviour. Human raters coded responses on risk and safety dimensions, and each model was analysed qualitatively. Models separated into two distinct tiers: GPT-4o, Grok 4.1 Fast, and Gemini 3 Pro exhibited high-risk, low-safety profiles; Claude Opus 4.5 and GPT-5.2 Instant displayed the opposite pattern. As context accumulated, performance tended to degrade in the unsafe group, while the same material activated stronger safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
