# Pre-operative T-stage discrimination in gallbladder cancer using machine learning and DeepSeek-R1

**Authors:** Joongwon Chae, Zhenyu Wang, Duanpo Wu, Lian Zhang, Alexander Tuzikov, Magrupov Talat Madiyevich, Min Xu, Dongmei Yu, Peiwu Qin

PMC · DOI: 10.3389/fonc.2025.1613462 · 2025-08-01

## TL;DR

This study found that blood biomarkers and machine learning models poorly distinguish early stages of gallbladder cancer, while a large language model using radiology reports achieved high accuracy.

## Contribution

Demonstrated the superior performance of a large language model over biomarker-based machine learning for T-stage discrimination in gallbladder cancer.

## Key findings

- Blood biomarker-based machine learning models showed poor T-stage discrimination, with AUROC near random chance.
- DeepSeek-R1 achieved 89.6% accuracy using radiology reports alone, with no improvement from adding biomarker data.
- SMOTE improved cross-validation accuracy but did not enhance test set performance for biomarker models.

## Abstract

Gallbladder cancer (GBC) frequently exhibits non-specific early symptoms, delaying diagnosis. This study (i) assessed whether routine blood biomarkers can distinguish early T stages via machine learning and (ii) compared the T-stage discrimination performance of a large language model (DeepSeek-R1) when supplied with (a) radiology-report text alone versus (b) radiology-report text plus blood-biomarker values.

We retrospectively analyzed 232 pathologically confirmed GBC patients treated at Lishui Central Hospital between 2023 and 2024 (T1, n = 51; T2, n = 181). Seven blood variables—neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), platelet-tolymphocyte ratio (PLR), carcino-embryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 125 (CA125), and alpha-fetoprotein (AFP)—were used to train Random forest, Support Vector Machine (SVC), XGBoost, and LightGBM models. Synthetic Minority Over-sampling Technique (SMOTE) was applied only to the training folds in one setting and omitted in another. Model performance was evaluated on an independent test set (N = 47) by the area under the receiver-operating-characteristic curve (AUROC, 95% CI by 1 000-sample bootstrap confidence interval, CI); cross-validation (CV) accuracy served as a supplementary metric. DeepSeek-R1 was prompted in a zero-shot, chain-of-thought manner to classify T1 versus T2 using (a) the radiology report alone or (b) the report plus the patient’s biomarker profile.

Biomarker-based machine-learning models yielded uniformly poor T-stage discrimination. Without SMOTE, individual models such as XGBoost achieved an AUROC of 0.508 on the independent test set, while recall for the T1 class remained low (e.g., 14.3% for some models), indicating performance near random chance. Applying SMOTE to the training data produced statistically significant gains in cross-validation (CV) accuracy for several models (e.g., XGBoost CV Acc. 0.71 → 0.80, p = 0.005; LGBM CV Acc. [No-SMOTE] → [SMOTE], p = 0.004). However, these improvements did not translate to better discrimination on the independent test set; for instance, XGBoost’s AUROC decreased from 0.508 to 0.473 after SMOTE application. Overall, the biomarker models failed to provide clinically meaningful T-stage differentiation. DeepSeek-R1 analyzing radiology text alone reached 89.6% accuracy on the full 232-patient cohort dataset, and consistently flagged T2 cases on phrases such as “gallbladder wall thickening.” Supplying biomarker values did not change accuracy (89.6%)

The evaluated blood biomarkers did not independently aid early T-stage discrimination, and SMOTE offered no meaningful performance gain. Conversely, a radiologytext-driven large language model delivered high accuracy with interpretable rationale, highlighting its potential to guide surgical strategy in GBC. Prospective multi-center studies with larger cohorts are warranted to confirm these findings.

## Linked entities

- **Diseases:** gallbladder cancer (MONDO:0003220)

## Full-text entities

- **Genes:** CEACAM3 (CEA cell adhesion molecule 3) [NCBI Gene 1084] {aka CD66D, CEA, CGM1, CGM1a, W264, W282}, MUC16 (mucin 16, cell surface associated) [NCBI Gene 94025] {aka CA125}, AFP (alpha fetoprotein) [NCBI Gene 174] {aka AFPD, FETA, HPAFP}
- **Diseases:** GBC (MESH:D005706), T (MESH:D001260)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12355213/full.md

---
Source: https://tomesphere.com/paper/PMC12355213