Exploring the Design Space of 3D MLLMs for CT Report Generation

Mohammed Baharoon; Jun Ma; Congyu Fang; Augustin Toma; Bo Wang

arXiv:2506.21535·eess.IV·September 23, 2025

Exploring the Design Space of 3D MLLMs for CT Report Generation

Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

PDF

Open Access 1 Repo

TL;DR

This paper systematically explores the design space of 3D multimodal large language models for radiology report generation, introducing knowledge-based augmentation methods and analyzing factors affecting performance.

Contribution

It provides a comprehensive investigation of 3D MLLMs for CT report generation, including new augmentation techniques and insights into model and data size effects.

Findings

01

Knowledge-based report augmentation improves GREEN score by up to 10%.

02

Report generation performance is largely independent of LLM size.

03

Using segmentation masks with CT volumes enhances report quality.

Abstract

Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bowang-lab/amos-mm-solution
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies