MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation
Sunggu Kyung, Jinyoung Seo, Hyunseok Lim, Dongyeong Kim, Hyungbin Park, Jimin Sung, Jihyun Kim, Wooyoung Jo, Yoojin Nam, Namkug Kim

TL;DR
MedRegion-CT is a region-focused multimodal LLM that enhances 3D CT report generation by capturing region-specific details, improving clinical relevance and interpretability over existing global-feature methods.
Contribution
We introduce a novel region-focused multimodal LLM framework with region pooling, pseudo-mask processing, and patient-specific attributions for improved 3D CT report generation.
Findings
Achieved state-of-the-art performance on RadGenome-Chest CT report generation benchmark.
Outperformed existing methods in natural language quality and clinical relevance.
Enhanced interpretability through patient-specific attributions.
Abstract
The recent release of RadGenome-Chest CT has significantly advanced CT-based report generation. However, existing methods primarily focus on global features, making it challenging to capture region-specific details, which may cause certain abnormalities to go unnoticed. To address this, we propose MedRegion-CT, a region-focused Multi-Modal Large Language Model (MLLM) framework, featuring three key innovations. First, we introduce Region Representative () Token Pooling, which utilizes a 2D-wise pretrained vision model to efficiently extract 3D CT features. This approach generates global tokens representing overall slice features and region tokens highlighting target areas, enabling the MLLM to process comprehensive information effectively. Second, a universal segmentation model generates pseudo-masks, which are then processed by a mask encoder to extract region-centric features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Healthcare
