# Gene expression and metadata based identification of key genes for lung cancer, COPD, and IPF using machine learning and statistical models

**Authors:** Mst. Farjana Yasmin, Md. Faruk Hosen, Md. Abul Basar, Anichur Rahman, Mahedi Hasan, Fahmid Al Farid, Hezerul Abdul Karim, Abu Saleh Musa Miah

PMC · DOI: 10.1371/journal.pone.0344666 · PLOS One · 2026-03-19

## TL;DR

This study uses machine learning and gene expression data to find key genes involved in lung cancer, COPD, and IPF, offering potential targets for new treatments.

## Contribution

The study identifies four key genes (ETS1, MSH2, RORA, PMAIP1) as potential therapeutic targets by integrating multiple bioinformatics and machine learning approaches.

## Key findings

- ETS1, MSH2, RORA, and PMAIP1 were identified as key hub genes across lung cancer, COPD, and IPF.
- Integration of differential gene expression, PPI networks, and metadata revealed four candidate genes for further research.
- Proposed drug compounds targeting these genes suggest new treatment avenues for the diseases.

## Abstract

Lung cancer (LC) is one of the most prevalent and deadly cancers globally, presenting a major public health challenge. Patients with chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis (IPF) are at a significantly higher risk of developing lung cancer. Despite developments in research, the primary molecular pathways of many disorders remain poorly understood. The current study aimed to identify potential therapeutic genes for lung cancer (LC), chronic obstructive pulmonary disease (COPD), and idiopathic pulmonary fibrosis (IPF) through machine learning (ML) and bioinformatics methodologies. The differentially expressed genes (DEGs) were identified across three datasets utilising DESeq2 and limma, and the common genes among the DEGs from these datasets were subsequently selected. The protein-protein interaction (PPI) networks were generated utilising STRING, and major hub genes were discerned via topological analysis. The Key hub genes, such as ETS1, MSH2, RORA, and PMAIP1, were detected. The pathways named KEGG and cancer pathway studies were conducted to evaluate their contributions to disease processes. The research included network-based methodologies, including transcription factors, GO keywords, gene–miRNA relationships, and survival data analyses, to further narrow the list of differential genes linked to LC, COPD, and IPF. The metadata for hub genes was aggregated from prior studies to integrate earlier discoveries. In the end, four key candidate genes (ETS1, MSH2, RORA, and PMAIP1) were found by intersecting the common differentially expressed genes, hub genes, major module genes, and meta-hub genes. The outcomes present a solid framework for subsequent research and therapy strategies for LC, COPD, and IPF. The potential drug compounds targeting the identified key genes are proposed, offering new avenues for the development of treatment.

## Linked entities

- **Genes:** ETS1 (ETS proto-oncogene 1, transcription factor) [NCBI Gene 2113], MSH2 (mutS homolog 2) [NCBI Gene 4436], RORA (RAR related orphan receptor A) [NCBI Gene 6095], PMAIP1 (phorbol-12-myristate-13-acetate-induced protein 1) [NCBI Gene 5366]
- **Diseases:** lung cancer (MONDO:0005138), chronic obstructive pulmonary disease (MONDO:0005002), idiopathic pulmonary fibrosis (MONDO:0800029)

## Full-text entities

- **Genes:** PIK3CB (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta) [NCBI Gene 5291] {aka P110BETA, PI3K, PI3KBETA, PIK3C1}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, ETS1 (ETS proto-oncogene 1, transcription factor) [NCBI Gene 2113] {aka ETS-1, EWSR2, c-ets-1, p54}, CTNNB1 (catenin beta 1) [NCBI Gene 1499] {aka CTNNB, EVR7, MRD19, NEDSDV, armadillo}, CFH (complement factor H) [NCBI Gene 3075] {aka AHUS1, AMBP1, ARMD4, ARMS1, CFHL3, FH}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, CCNL1 (cyclin L1) [NCBI Gene 57018] {aka ANIA6A, BM-001, PRO1073, ania-6a}, RORA (RAR related orphan receptor A) [NCBI Gene 6095] {aka IDDECA, NR1F1, ROR1, ROR2, ROR3, RORa1}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, PMAIP1 (phorbol-12-myristate-13-acetate-induced protein 1) [NCBI Gene 5366] {aka APR, NOXA}, IL17A (interleukin 17A) [NCBI Gene 3605] {aka CTLA-8, CTLA8, IL-17, IL-17A, IL17, ILA17}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, NEDD9 (neural precursor cell expressed, developmentally down-regulated 9) [NCBI Gene 4739] {aka CAS-L, CAS2, CASL, CASS2, HEF1}, MSH2 (mutS homolog 2) [NCBI Gene 4436] {aka COCA1, FCC1, HNPCC, HNPCC1, LCFS2, LYNCH1}, NFASC (neurofascin) [NCBI Gene 23114] {aka NEDCPMD, NF, NRCAML}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, SORD (sorbitol dehydrogenase) [NCBI Gene 6652] {aka HEL-S-95n, HMNR8, RDH, SDH, SORD1, SORDD}
- **Diseases:** cardiovascular disease (MESH:D002318), immune abnormalities (MESH:D007154), airway obstruction (MESH:D000402), NSCLC (MESH:D002289), smoking (MESH:D015208), respiratory diseases (MESH:D012140), Spinal Cord Injury (MESH:D013119), Reduced lung function (MESH:D001523), COPD (MESH:D029424), interstitial pulmonary fibrosis (MESH:D011658), lung disease (MESH:D008171), chronic inflammation (MESH:D007249), interstitial lung disease (MESH:D017563), IPF (MESH:D054990), infection (MESH:D007239), insulin (MESH:D007333), small cell lung cancer (MESH:D055752), Malaria (MESH:D008288), carcinogenesis (MESH:D063646), Lung Cancer (MESH:D008175), fibrosis (MESH:D005355), deaths (MESH:D003643), cancer (MESH:D009369)
- **Chemicals:** N-acetyl-L-cysteine (MESH:D000111), citebib41 (-), ivermectin (MESH:D007559), astemizole (MESH:D016589)
- **Species:** Homo sapiens (human, species) [taxon 9606], Nicotiana tabacum (American tobacco, species) [taxon 4097]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13001922/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13001922/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC13001922/full.md

---
Source: https://tomesphere.com/paper/PMC13001922