TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

Sijing Li; Zhongwei Qiu; Jiang Liu; Wenqiao Zhang; Tianwei Lin; Yihan Xie; Jianxiang An; Boxiang Yun; Chenglin Yang; Jun Xiao; Guangyu Guo; Jiawen Yao; Wei Liu; Yuan Gao; Ke Yan; Weiwei Cao; Zhilin Zheng; Tony C. W. Mok; Kai Cao; Yu Shi; Jiuyu Zhang; Jian Zhou; Beng Chin Ooi; Yingda Xia; Ling Zhang

arXiv:2603.05867·cs.CV·March 10, 2026

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi

PDF

Open Access 3 Reviews

TL;DR

TumorChain introduces a multimodal reasoning framework and a large-scale dataset for traceable, accurate tumor analysis from imaging and clinical data, enhancing interpretability and reducing errors in clinical oncology.

Contribution

The paper presents TumorChain, a novel multimodal interleaved reasoning model, and TumorCoT, a large-scale dataset for step-by-step tumor analysis, advancing clinical interpretability and accuracy.

Findings

01

Improved lesion detection accuracy

02

Enhanced impression generation quality

03

Better pathology classification performance

Abstract

Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

+ The TumorCoT-1.5M dataset is a highly valuable contribution to the field. A 3D, instruction-tuned dataset of this scale, specifically designed for traceable, multi-step reasoning in a high-stakes domain like oncology, is a significant achievement and will likely enable new research directions. + The TumorChain model's core mechanism, Iterative Interleaved Reasoning (IIR), is an effective way to handle the high dimensionality of 3D medical images. + TumorChain achieves top performance across su

Weaknesses

- The dataset relies heavily on an MLLM-based data-generation pipeline. This introduces a risk of factual errors, bias, or hallucinations. The paper does not sufficiently discuss the quality control process for this generated data, beyond mentioning expert reviews. Furthermore, critical details on data splitting (e.g., ensuring splits are at the patient level to prevent leakage) are not detailed in the main text. - The paper uses terms like "causality-guided inference" and "iterative causal reas

Reviewer 02Rating 6Confidence 3

Strengths

1. This paper introduces a reasoning paradigm aligned with clinical workflows, featuring a traceable closed-loop process of 'Findings,Impressions,Pathology'. 2. This paper presents TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with stepaligned rationales and cross-modal alignments along the “findings→impression→pathology” trajectory. 3. This paper proposes TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging enc

Weaknesses

1. The authors use a proposed multi-agent, knowledge-graph-guided engine to generate chain-of-thought reasoning, which may introduce bias. Could the authors conduct a small-scale CoT evaluation with medical experts to validate the reasoning quality? 2. Does the use of the segmentation model increase inference or training latency, potentially reducing efficiency? How do the authors address this issue? 3. The comparison with commercial models only includes GPT-5-Mini and Gemini2.0-Flash. How do th

Reviewer 03Rating 6Confidence 4

Strengths

1. The creation of TumorCoT-1.5M, a comprehensive CoT dataset, provides a valuable resource for the community. 2. The proposed TumorChain framework, enables deep reasoning steps and achieves significant performance gains on the TumorCoT and DeepTumorVQA benchmarks

Weaknesses

1. The modality of the dataset and the empirical experiment is limited to CT modality only, It remains unclear whether the approach generalizes to other medical reasoning tasks or modalities. 2. The CoT data generation are relying on LLMs, lacking Human Clinical Validation. This raises concerns about potential biases, and hallucination.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling