RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

M\'elanie Roschewitz; Kenneth Styppa; Yitian Tao; Jiwoong Sohn; Jean-Benoit Delbrouck; Benjamin Gundersen; Nicolas Deperrois; Christian Bluethgen; Julia Vogt; Bjoern Menze; Farhad Nooralahzadeh; Michael Krauthammer; Michael Moor

arXiv:2604.15231·cs.AI·April 17, 2026

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

M\'elanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor

PDF

1 Models

TL;DR

RadAgent is an interpretable AI tool that generates chest CT reports through stepwise reasoning, providing transparent decision traces and improving accuracy, robustness, and faithfulness over previous models.

Contribution

It introduces RadAgent, a novel tool-using AI agent that produces interpretable, step-by-step CT reports with decision traces, enhancing transparency and reliability in medical imaging analysis.

Findings

01

RadAgent improves macro-F1 by 6.0 points and micro-F1 by 5.4 points over CT-Chat.

02

RadAgent's robustness under adversarial conditions increases by 24.7 points.

03

RadAgent achieves 37.0% faithfulness, a new capability in this domain.

Abstract

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by a fully inspectable trace of intermediate decisions and tool interactions, allowing clinicians to examine how the reported findings are derived. In our experiments, we observe that RadAgent improves Chest CT report generation over its 3D VLM counterpart, CT-Chat, across three dimensions. Clinical accuracy improves by 6.0 points (36.4% relative) in macro-F1 and 5.4 points (19.6% relative) in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
RadAgent/radagent-qwen3-14b-lora
model· 31 dl· ♡ 3
31 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.