TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi

TL;DR
TumorChain introduces a multimodal reasoning framework and a large-scale dataset for traceable, accurate tumor analysis from imaging and clinical data, enhancing interpretability and reducing errors in clinical oncology.
Contribution
The paper presents TumorChain, a novel multimodal interleaved reasoning model, and TumorCoT, a large-scale dataset for step-by-step tumor analysis, advancing clinical interpretability and accuracy.
Findings
Improved lesion detection accuracy
Enhanced impression generation quality
Better pathology classification performance
Abstract
Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling…
Peer Reviews
Decision·ICLR 2026 Poster
+ The TumorCoT-1.5M dataset is a highly valuable contribution to the field. A 3D, instruction-tuned dataset of this scale, specifically designed for traceable, multi-step reasoning in a high-stakes domain like oncology, is a significant achievement and will likely enable new research directions. + The TumorChain model's core mechanism, Iterative Interleaved Reasoning (IIR), is an effective way to handle the high dimensionality of 3D medical images. + TumorChain achieves top performance across su
- The dataset relies heavily on an MLLM-based data-generation pipeline. This introduces a risk of factual errors, bias, or hallucinations. The paper does not sufficiently discuss the quality control process for this generated data, beyond mentioning expert reviews. Furthermore, critical details on data splitting (e.g., ensuring splits are at the patient level to prevent leakage) are not detailed in the main text. - The paper uses terms like "causality-guided inference" and "iterative causal reas
1. This paper introduces a reasoning paradigm aligned with clinical workflows, featuring a traceable closed-loop process of 'Findings,Impressions,Pathology'. 2. This paper presents TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with stepaligned rationales and cross-modal alignments along the “findings→impression→pathology” trajectory. 3. This paper proposes TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging enc
1. The authors use a proposed multi-agent, knowledge-graph-guided engine to generate chain-of-thought reasoning, which may introduce bias. Could the authors conduct a small-scale CoT evaluation with medical experts to validate the reasoning quality? 2. Does the use of the segmentation model increase inference or training latency, potentially reducing efficiency? How do the authors address this issue? 3. The comparison with commercial models only includes GPT-5-Mini and Gemini2.0-Flash. How do th
1. The creation of TumorCoT-1.5M, a comprehensive CoT dataset, provides a valuable resource for the community. 2. The proposed TumorChain framework, enables deep reasoning steps and achieves significant performance gains on the TumorCoT and DeepTumorVQA benchmarks
1. The modality of the dataset and the empirical experiment is limited to CT modality only, It remains unclear whether the approach generalizes to other medical reasoning tasks or modalities. 2. The CoT data generation are relying on LLMs, lacking Human Clinical Validation. This raises concerns about potential biases, and hallucination.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
