3DMedAgent: Unified Perception-to-Understanding for 3D Medical Analysis

Ziyue Wang; Linghan Cai; Chang Han Low; Haofeng Liu; Junde Wu; Jingyu Wang; Rui Wang; Lei Song; Jiang Bian; Jingjing Fu; Yueming Jin

arXiv:2602.18064·cs.CV·March 10, 2026

3DMedAgent: Unified Perception-to-Understanding for 3D Medical Analysis

Ziyue Wang, Linghan Cai, Chang Han Low, Haofeng Liu, Junde Wu, Jingyu Wang, Rui Wang, Lei Song, Jiang Bian, Jingjing Fu, Yueming Jin

PDF

Open Access

TL;DR

3DMedAgent is a unified system that enables large language models to analyze 3D medical CT scans by decomposing complex tasks into manageable steps, improving understanding and reasoning without 3D-specific training.

Contribution

It introduces a novel unified agent that bridges 2D multimodal models with 3D medical analysis, with a structured memory and multi-step reasoning capabilities.

Findings

01

Outperforms existing models on 40+ tasks

02

Effectively integrates heterogeneous visual and textual tools

03

Demonstrates scalable general-purpose 3D clinical analysis

Abstract

3D CT analysis spans a continuum from low-level perception to high-level clinical understanding. Existing 3D-oriented analysis methods adopt either isolated task-specific modeling or task-agnostic end-to-end paradigms to produce one-hop outputs, impeding the systematic accumulation of perceptual evidence for downstream reasoning. In parallel, recent multimodal large language models (MLLMs) exhibit improved visual perception and can integrate visual and textual information effectively, yet their predominantly 2D-oriented designs fundamentally limit their ability to perceive and analyze volumetric medical data. To bridge this gap, we propose 3DMedAgent, a unified agent that enables 2D MLLMs to perform general 3D CT analysis without 3D-specific fine-tuning. 3DMedAgent coordinates heterogeneous visual and textual tools through a flexible MLLM agent, progressively decomposing complex 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning