CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs

Elham Nasarian; Abhilash Neog; Kwok-Leung Tsui; Niyousha HosseiniChimeh

arXiv:2604.26959·cs.CY·May 1, 2026

CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs

Elham Nasarian, Abhilash Neog, Kwok-Leung Tsui, Niyousha HosseiniChimeh

PDF

TL;DR

CareGuardAI is a novel framework that enhances safety and factual accuracy in patient-facing LLMs by assessing clinical risk and hallucination risk through a multi-stage, context-aware pipeline.

Contribution

It introduces a risk-aware safety framework with inference-time evaluation and iterative refinement, improving reliability of medical LLM responses in open-ended healthcare scenarios.

Findings

01

Outperforms baseline models on safety and hallucination benchmarks.

02

Employs a multi-stage pipeline with risk assessments for safe response generation.

03

Ensures responses meet clinical safety and factual reliability thresholds.

Abstract

Integrating large language models (LLMs) into patient-facing healthcare systems offers significant potential to improve access to medical information. However, ensuring clinical safety and factual reliability remains a critical challenge. In practice, AI-generated responses may be conditionally correct yet medically inappropriate, as models often fail to interpret patient context and tend to produce agreeable responses rather than challenge unsafe assumptions. Unlike clinicians, who infer risk from incomplete information, LLMs frequently lack contextual awareness. Moreover, real-world patient interactions are open-ended and underspecified, unlike structured benchmark settings. We present CareGuardAI, a risk-aware safety framework for patient-facing medical question answering that addresses two key failure modes: clinical safety risk and hallucination risk. The framework introduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.