# Integrative Analysis of Multi-source Public Databases to Screen Core Genes for Constructing A Prognostic Risk Model in Lung Adenocarcinoma

**Authors:** Chengmeng WANG, Lu ZHANG, Yu ZHANG, Yu WANG, Meng WANG

PMC · DOI: 10.3779/j.issn.1009-3419.2025.102.38 · Chinese Journal of Lung Cancer · 2025-10-20

## TL;DR

This study identifies key genes linked to drug resistance and prognosis in lung adenocarcinoma, building a risk model to improve treatment and survival predictions.

## Contribution

A novel prognostic risk model for lung adenocarcinoma based on integrative analysis of multi-source public databases and drug resistance-related genes.

## Key findings

- 12 core differentially expressed genes (e.g., HMGA1, PLEK2) were identified as potential biomarkers for drug resistance and prognosis.
- A 9-gene risk model was developed and validated, showing strong predictive power for patient survival and drug response.
- PLEK2 was found to be highly expressed in lung adenocarcinoma tissues and associated with EGFR-TKIs resistance.

## Abstract

肺腺癌（lung adenocarcinoma, LUAD）靶向治疗中酪氨酸激酶抑制剂（tyrosine kinase inhibitors, TKIs）耐药问题突出，亟需筛选与耐药及预后相关的关键分子标志物以指导精准治疗。本研究旨在探究LUAD TKIs耐药的分子机制，筛选核心差异表达基因（differentially expressed genes, DEGs），明确不同基因聚类与患者生存、药物反应的关联，构建并验证LUAD预后预测的风险模型，为LUAD精准治疗与预后评估提供依据。

整合GSE162045、GSE114647等多个LUAD相关数据集，通过韦恩图筛选核心重叠DEGs并构建基因相关性网络。采用共识聚类法对样本进行分组，结合t-SNE降维可视化验证聚类稳定性与区分度。运用京都基因与基因组百科全书（Kyoto Encyclopedia of Genes and Genomes, KEGG）与基因集富集分析（Gene Set Enrichment Analysis, GSEA）探究DEGs功能。比较不同聚类中12种药物的半抑制浓度（50% maximal inhibitory concentration, IC50）值，评估药物敏感性差异。通过LASSO回归筛选预后相关核心基因构建风险模型，并在GSE31210队列中通过桑基图、Kaplan-Meier生存曲线、受试者工作特征（reciever operating characteristic, ROC）曲线验证模型效能。分析关键基因在不同聚类及风险组间的表达差异，绘制单基因表达与生存关联的Kaplan-Meier曲线。基于多个数据集（GSE19804、GSE19188、GSE44077、GSE30219）分析PLEK2在LUAD组织中的表达，并通过Western blot检测其在表皮生长因子受体（epidermal growth factor receptor, EGFR）-TKIs耐药细胞系中的蛋白水平。

筛选出12个核心DEGs（如HMGA1、PLEK2等）；当聚类数（K值）为2时样本稳定分为Cluster A和Cluster B，10个核心基因在两组中表达差异显著（P<0.0001），且Cluster A患者总生存期（overall survival, OS）、无病生存期（disease-free survival, DFS）、无进展生存期（progression-free survival, PFS）均显著优于Cluster B。两组在TP53、KRAS、EGFR等高频基因突变类型上存在明显差异，KEGG富集分析显示差异基因主要富集于“细胞周期”“神经活性配体-受体相互作用”等通路。GSEA提示Cluster B与肿瘤恶性进展相关基因集显著关联。药物敏感性分析显示两聚类对10种药物的IC50值存在显著差异。成功构建基于9个基因的风险模型，高风险组患者死亡比例更高、生存率更低（P<0.0001），模型在1、3、5年的曲线下面积（area under the area, AUC）分别为0.700、0.647、0.675，GSE31210队列验证显示模型具有良好稳定性与通用性。关键基因在风险组间表达差异显著（P<0.0001），其中HMGA1、PLEK2高表达提示预后不良，而ID3、DAPK2与预后无关。将临床变量与LASSO风险评分纳入分析，单因素Cox分析显示风险评分与OS显著关联（HR=0.49, P=3.80×10-6）；多因素校正后，风险评分仍为独立预后因素（HR=0.57, P=6.40×10-4），具有稳定独立预测价值。公共数据集分析及Western blot实验均证实，PLEK2在LUAD组织中表达上调，且在EGFR-TKIs耐药细胞系中表达进一步升高。

本研究构建的风险模型可有效预测LUAD患者的预后，其中PLEK2在LUAD中高表达且与EGFR-TKIs耐药有关，可能成为潜在的预后标志物和治疗靶点。

## Linked entities

- **Genes:** HMGA1 (high mobility group AT-hook 1) [NCBI Gene 3159], PLEK2 (pleckstrin 2) [NCBI Gene 26499], TP53 (tumor protein p53) [NCBI Gene 7157], KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845], EGFR (epidermal growth factor receptor) [NCBI Gene 1956], ID3 (inhibitor of DNA binding 3) [NCBI Gene 3399], DAPK2 (death associated protein kinase 2) [NCBI Gene 23604]
- **Diseases:** lung adenocarcinoma (MONDO:0005061)

## Full-text entities

- **Genes:** HMGA1 (high mobility group AT-hook 1) [NCBI Gene 3159] {aka HMG-R, HMGA1A, HMGIY}, PLEK2 (pleckstrin 2) [NCBI Gene 26499], DAPK2 (death associated protein kinase 2) [NCBI Gene 23604] {aka DRP-1, DRP1}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, TXK (TXK tyrosine kinase) [NCBI Gene 7294] {aka BTKL, PSCTK5, PTK4, RLK, TKL}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, ID3 (inhibitor of DNA binding 3) [NCBI Gene 3399] {aka HEIR-1, bHLHb25}
- **Diseases:** deaths (MESH:D003643), tumors (MESH:D009369), LUAD (MESH:D000077192)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12782933/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12782933/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12782933/full.md

---
Source: https://tomesphere.com/paper/PMC12782933