# Application of Explainable Deep Learning in Differentiating Benign from Malignant Pulmonary Space-occupying Lesions and Classifying Pathological Subtypes of Lung Cancer

**Authors:** Haoran LI, Yuanyuan WANG, Yang WANG, Huihui HE, Junya LI, Yanning SU, Fanrui KONG, Xiangli LIU, Liuhui CHENG, Ya LI

PMC · DOI: 10.3779/j.issn.1009-3419.2025.102.36 · Chinese Journal of Lung Cancer · 2025-10-20

## TL;DR

This study uses explainable deep learning to distinguish benign from malignant lung lesions and classify lung cancer subtypes, improving clinical decision-making.

## Contribution

A novel hybrid TT-ResMLP architecture is proposed for interpretable deep learning in pulmonary lesion diagnosis and lung cancer subtype classification.

## Key findings

- TT-ResMLP achieved strong performance in both benign-malignant diagnosis and lung cancer subtype classification tasks.
- SHAP analysis revealed key features like age, pleural retraction, and NSE contribute significantly to model predictions.
- Lung-RADS scores showed high importance across all lung cancer subtypes, with distinct correlations depending on cancer type.

## Abstract

肺占位性病变的良恶性鉴别与肺癌病理亚型分类是临床决策的关键，但传统方法存在多源临床数据利用不足及深度学习模型可解释性差的问题。本研究基于针对表格化数据设计的Transformer（Tab-Transformer）与残差多层感知器（Residual Multi-Layer Perceptron, ResMLP）的混合架构（TT-ResMLP），探讨可解释性深度学习算法在肺占位性病变良恶性诊断及肺癌病理亚型分类中的性能。

收集345例经病理证实的肺占位性病变患者的影像学特征、病史资料及实验室检查等数据，按8:2随机分为训练集和测试集。采用Spearman检验与最小绝对收缩和选择算子（Least Absolute Shrinkage and Selection Operator, LASSO）筛选稳定特征，使用合成少数类过采样技术（Synthetic Minority Over-sampling Technique, SMOTE）平衡样本，采用10折交叉验证提高模型泛化能力，选用Tab-Transformer算法、ResMLP算法、TT-ResMLP构建模型，通过受试者工作特征（receiver operating characteristic, ROC）曲线、曲线下面积（area under the curve, AUC）、准确率、特异性、敏感性和微平均ROC（micro-averaged ROC, micro-ROC）曲线评估模型性能，并基于最优模型进行SHAP（SHapley Additive exPlanations）特征分析。

良恶性诊断模型中，3种模型均表现良好，其中Tab-Transformer在测试集表现最优，TT-ResMLP和ResMLP次之；SHAP分析显示，表现最优的Tab-Transformer模型特征重要性依次是年龄、胸膜凹陷征、凝血酶时间、平均密度、磨玻璃样改变等，其中胸膜凹陷征有较高的恶性诊断贡献，且随年龄增长、凝血酶时间缩短，其贡献度进一步增强。在肺癌亚型分类任务中，3种模型均表现出优异性能，其中TT-ResMLP综合表现最优。SHAP分析进一步揭示，肺部影像报告和数据系统评分（Lung Imaging Reporting and Data System, Lung-RADS）在3种病理亚型中均具较高重要性；男性与鳞癌预测呈正相关；神经元特异性烯醇化酶（neuron-specific enolase, NSE）在小细胞癌预测中起重要作用。在腺癌中，诊断概率与Lung-RADS分级呈正相关，且在低凝血酶原时间值时更显著；而在鳞癌与小细胞癌亚组中呈负相关，但性别和NSE水平可增强其风险预测的贡献。特征决策边界分析显示，Lung-RADS分级在腺癌识别中具有较高的区分能力，而NSE在小细胞癌识别中展现出更强的区分能力。

TT-ResMLP混合架构能达到肺占位性病变的良恶性诊断及肺癌病理亚型分类的目的，模型具备良好的可解释性，有助于识别关键预测特征并揭示其交互机制，为深入理解肺癌生物学行为及临床辅助决策提供了有效工具。

Baseline clinical data of 345 patients with pulmonary space-occupying lesions of different pathological types

LASSO feature selection results for the benign-malignant diagnosis model of pulmonary space-occupying lesions and the lung cancer subtype classification model

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138), adenocarcinoma (MONDO:0004970), squamous cell carcinoma (MONDO:0005096), small cell carcinoma (MONDO:0000402)

## Full-text entities

- **Genes:** F2 (coagulation factor II, thrombin) [NCBI Gene 2147] {aka PT, RPRGL2, THPH1}, ENO2 (enolase 2) [NCBI Gene 2026] {aka HEL-S-279, NSE}
- **Diseases:** adenocarcinoma (MESH:D000230), Pulmonary Space-occupying Lesions (MESH:D008171), Lung Cancer (MESH:D008175), squamous cell carcinoma (MESH:D002294), Pleural (MESH:D010995), small cell carcinoma (MESH:D018288)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12782943/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12782943/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12782943/full.md

---
Source: https://tomesphere.com/paper/PMC12782943