AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Nima Fathi; Amar Kumar; Tal Arbel

arXiv:2507.16940·cs.CV·July 24, 2025

AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation

Nima Fathi, Amar Kumar, Tal Arbel

PDF

Open Access

TL;DR

AURA is a novel multi-modal AI agent designed for medical image analysis that combines visual and linguistic explanations, enabling interactive, explainable, and clinically relevant reasoning in medical diagnostics.

Contribution

This work introduces AURA, the first visual linguistic explainability agent tailored for medical imaging, integrating multiple modules for segmentation, reasoning, and evaluation.

Findings

01

AURA enables dynamic interaction and hypothesis testing in medical image analysis.

02

AURA provides transparent, interpretable explanations for medical diagnoses.

03

AURA demonstrates improved diagnostic relevance and visual interpretability.

Abstract

Recent advancements in Large Language Models (LLMs) have catalyzed a paradigm shift from static prediction systems to agentic AI agents capable of reasoning, interacting with tools, and adapting to complex tasks. While LLM-based agentic systems have shown promise across many domains, their application to medical imaging remains in its infancy. In this work, we introduce AURA, the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images. By enabling dynamic interactions, contextual explanations, and hypothesis testing, AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems. We highlight the promise of agentic AI in transforming medical image analysis from static predictions to interactive decision support. Leveraging Qwen-32B, an LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · AI-based Problem Solving and Planning · Multi-Agent Systems and Negotiation