An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn; Florian Buettner

arXiv:2507.05755·cs.AI·July 9, 2025

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

PDF

Open Access

TL;DR

This paper introduces ModelAuditor, an autonomous, interpretable agent that identifies failure modes and suggests improvements for clinical AI models under real-world distribution shifts, enhancing reliability efficiently.

Contribution

We present ModelAuditor, a novel self-reflective agent that audits clinical AI models for failure modes and provides actionable insights, a significant advancement over existing bespoke reliability checks.

Findings

01

Successfully identified failure modes in three clinical scenarios.

02

Recovered 15-25% of performance lost due to distribution shifts.

03

Operates on consumer hardware in under 10 minutes at low cost.

Abstract

The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · COVID-19 diagnosis using AI