A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ziruo Yi; Jinyu Liu; Ting Xiao; Mark V. Albert

arXiv:2508.02841·cs.AI·August 6, 2025

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

Ziruo Yi, Jinyu Liu, Ting Xiao, Mark V. Albert

PDF

TL;DR

This paper presents a multi-agent system for radiology visual question answering that enhances reasoning, accuracy, and interpretability in analyzing chest X-ray images, addressing current challenges in factual correctness and cross-modal alignment.

Contribution

The paper introduces a novel multi-agent architecture specifically designed for complex reasoning in RVQA, improving over existing multimodal large language model approaches.

Findings

01

Outperforms strong MLLM baselines in RVQA tasks

02

Demonstrates improved factual accuracy and interpretability

03

Effective in challenging, model-disagreement filtered cases

Abstract

Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.