An autonomous agent for auditing and improving the reliability of clinical AI models
Lukas Kuhn, Florian Buettner

TL;DR
This paper introduces ModelAuditor, an autonomous, interpretable agent that identifies failure modes and suggests improvements for clinical AI models under real-world distribution shifts, enhancing reliability efficiently.
Contribution
We present ModelAuditor, a novel self-reflective agent that audits clinical AI models for failure modes and provides actionable insights, a significant advancement over existing bespoke reliability checks.
Findings
Successfully identified failure modes in three clinical scenarios.
Recovered 15-25% of performance lost due to distribution shifts.
Operates on consumer hardware in under 10 minutes at low cost.
Abstract
The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · COVID-19 diagnosis using AI
