The Violation State: Safety State Persistence in a Multimodal Language Model Interface
Bentley DeVilling (Course Correct Labs)

TL;DR
This study reveals that safety refusals in multimodal AI systems like ChatGPT can persist across a conversation, affecting unrelated tasks, which raises concerns about safety state management and system reliability.
Contribution
It documents the phenomenon of safety-state persistence in multimodal AI interfaces, highlighting how initial safety violations influence subsequent unrelated interactions.
Findings
96.67% of image-generation requests were refused after initial copyright violation
Control sessions showed no refusals, indicating the effect is due to safety state
Safety refusals can persist across unrelated tasks in multimodal AI systems.
Abstract
Multimodal AI systems integrate text generation, image generation, and other capabilities within a single conversational interface. These systems employ safety mechanisms to prevent disallowed actions, including the removal of watermarks from copyrighted images. While single-turn refusals are expected, the interaction between safety filters and conversation-level state is not well understood. This study documents a reproducible behavioral effect in the ChatGPT (GPT-5.1) web interface. Manual execution was chosen to capture the exact user-facing safety behavior of the production system, rather than isolated API components. When a conversation begins with an uploaded copyrighted image and a request to remove a watermark, which the model correctly refuses, subsequent prompts to generate unrelated, benign images are refused for the remainder of the session. Importantly, text-only requests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
