Mental Health AI Safety Claims Must Preserve Temporal Evidence
Srimonti Dutta, Ratna Kandala

TL;DR
This paper emphasizes the importance of preserving temporal evidence in evaluating mental health AI safety, introducing a formal framework and a new evaluation standard to detect failures overlooked by traditional methods.
Contribution
It introduces Temporal Safety Non-Identifiability and the SCOPE-MH standard, enabling safety claims to be aligned with the actual evidence retained during evaluation.
Findings
Temporal safety properties cannot be certified by protocols that discard sequence features.
SCOPE-MH reveals failure mechanisms not captured by per-turn scoring.
Evaluation preserving temporal evidence is essential for safe mental health AI deployment.
Abstract
The safety of mental health AI is often judged at the wrong temporal scale. Current evaluations typically score isolated responses, endpoint outcomes, or aggregate dialogue quality, while clinically consequential failures may arise from the order and accumulation of interactions themselves, including delayed escalation, repeated reinforcement, dependency formation, failed repair, and gradual deterioration across turns. This paper argues that this mismatch is not merely a limitation of evaluation coverage but a source of invalid safety conclusions. We introduce Temporal Safety Non-Identifiability, a formal account of why safety properties that depend on sequence, timing, accumulation, or recovery cannot be certified by protocols that discard those features. From this formalization, we develop SCOPE (Safety Claims Over Preserved Evidence) as a general principle for aligning safety claims…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
