Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops

Zainab Ghafoor; Md Shafiqul Islam; Koushik Howlader; Md Rasel Khondokar; Tanusree Bhattacharjee; Sayantan Chakraborty; Adrito Roy; Ushashi Bhattacharjee; and Tirtho Roy

arXiv:2601.13268·cs.AI·January 21, 2026

Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops

Zainab Ghafoor, Md Shafiqul Islam, Koushik Howlader, Md Rasel Khondokar, Tanusree Bhattacharjee, Sayantan Chakraborty, Adrito Roy, Ushashi Bhattacharjee, and Tirtho Roy

PDF

Open Access

TL;DR

This paper presents a multi-agent iterative framework to improve the safety, ethical compliance, and reliability of medical AI models, demonstrating significant reductions in violations and risk levels across diverse clinical queries.

Contribution

Introduces a novel multi-agent refinement system that enhances medical LLM safety through structured, iterative alignment with ethical principles and risk assessments.

Findings

01

89% reduction in ethical violations

02

92% risk downgrade rate

03

Faster convergence with DeepSeek R1

Abstract

Large Language Models (LLMs) are increasingly applied in healthcare, yet ensuring their ethical integrity and safety compliance remains a major barrier to clinical deployment. This work introduces a multi-agent refinement framework designed to enhance the safety and reliability of medical LLMs through structured, iterative alignment. Our system combines two generative models - DeepSeek R1 and Med-PaLM - with two evaluation agents, LLaMA 3.1 and Phi-4, which assess responses using the American Medical Association's (AMA) Principles of Medical Ethics and a five-tier Safety Risk Assessment (SRA-5) protocol. We evaluate performance across 900 clinically diverse queries spanning nine ethical domains, measuring convergence efficiency, ethical violation reduction, and domain-specific risk behavior. Results demonstrate that DeepSeek R1 achieves faster convergence (mean 2.34 vs. 2.67…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning