CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering

Yuren Mao; Wenyi Xu; Yuyang Qin; Yunjun Gao

arXiv:2505.16229·cs.CV·May 23, 2025

CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering

Yuren Mao, Wenyi Xu, Yuyang Qin, Yunjun Gao

PDF

TL;DR

This paper introduces CT-Agent, a multimodal framework that enhances 3D CT radiology question answering by addressing anatomical complexity and spatial relationships, outperforming existing systems on two datasets.

Contribution

The paper presents a novel multimodal agentic framework that effectively handles 3D CT data complexity and spatial relationships for improved radiology question answering.

Findings

01

Superior performance on CT-RATE dataset

02

Effective handling of anatomical complexity

03

Accurate spatial relationship capture

Abstract

Computed Tomography (CT) scan, which produces 3D volumetric medical data that can be viewed as hundreds of cross-sectional images (a.k.a. slices), provides detailed anatomical information for diagnosis. For radiologists, creating CT radiology reports is time-consuming and error-prone. A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan and even automatically generate a radiology report is urgently needed. However, existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture. To address these issues, this paper proposes CT-Agent, a multimodal agentic framework for CTQA. CT-Agent adopts anatomically independent tools to break down the anatomic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.