InvisibleBench: A Deployment Gate for Caregiving Relationship AI

Ali Madad (GiveCare)

arXiv:2511.20733·cs.CY·November 27, 2025

InvisibleBench: A Deployment Gate for Caregiving Relationship AI

Ali Madad (GiveCare)

PDF

Open Access

TL;DR

InvisibleBench is a comprehensive evaluation framework for caregiving AI, assessing safety, compliance, trauma-informed design, cultural fit, and memory across multiple models and scenarios to identify safety gaps and improve deployment readiness.

Contribution

It introduces a novel deployment gate with detailed benchmarks and evaluation scenarios for longitudinal safety and ethical considerations in caregiving AI systems.

Findings

01

All models exhibit significant safety gaps in crisis detection.

02

DeepSeek Chat v3 achieves the highest overall safety score.

03

Different models excel in specific dimensions like compliance and trauma-informed design.

Abstract

InvisibleBench is a deployment gate for caregiving-relationship AI, evaluating 3-20+ turn interactions across five dimensions: Safety, Compliance, Trauma-Informed Design, Belonging/Cultural Fitness, and Memory. The benchmark includes autofail conditions for missed crises, medical advice (WOPR Act), harmful information, and attachment engineering. We evaluate four frontier models across 17 scenarios (N=68) spanning three complexity tiers. All models show significant safety gaps (11.8-44.8 percent crisis detection), indicating the necessity of deterministic crisis routing in production systems. DeepSeek Chat v3 achieves the highest overall score (75.9 percent), while strengths differ by dimension: GPT-4o Mini leads Compliance (88.2 percent), Gemini leads Trauma-Informed Design (85.0 percent), and Claude Sonnet 4.5 ranks highest in crisis detection (44.8 percent). We release all scenarios,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Digital Mental Health Interventions · Ethics and Social Impacts of AI