# PAH-former: Transfer learning for efficient discovery of pulmonary arterial hypertension-associated genes

**Authors:** Toshinaru Kawakami, Sosuke Hosokawa, Masamichi Ito, Atsumasa Kurozumi, Ryohei Tanaka, Shun Minatsuki, Junichi Ishida, Takayuki Isagawa, Satoshi Kodera, Norihiko Takeda, Nanako Kawaguchi, Nanako Kawaguchi, Nanako Kawaguchi

PMC · DOI: 10.1371/journal.pone.0344084 · 2026-03-06

## TL;DR

This study introduces PAH-former, a deep learning model that identifies genes linked to pulmonary arterial hypertension using limited patient data and validates them experimentally.

## Contribution

A novel transfer learning approach called PAH-former for efficient discovery of PAH-associated genes from scarce data.

## Key findings

- PAH-former identified 134 candidate genes, including known and novel ones, predicted to influence PAH.
- RNA interference validation showed that knockdown of top candidates increased SOX18 expression.
- The model offers a broadly applicable strategy for gene discovery in rare diseases.

## Abstract

Pulmonary arterial hypertension (PAH) is a severe disease with limited effective therapies, making the discovery of new therapeutic targets crucial. While single-cell RNA sequencing (sc-RNA seq) offers a powerful tool for this purpose, its application is hampered by the scarcity of patient samples. This study addresses the problem of how to efficiently identify novel, functionally relevant disease-associated genes from limited publicly available data.

We employed transfer learning by fine-tuning Geneformer, a deep learning model, with public sc-RNA seq data from patients with PAH to create a specialized model called PAH-former. This model was used to perform in silico perturbation analysis to identify and rank candidate genes predicted to influence the disease state. For validation, we performed RNA interference-mediated knockdown of top novel candidate genes in human pulmonary artery endothelial cells and measured the expression of SRY-Box Transcription Factor 18 (SOX18), a signature gene of pulmonary arterial hypertension.

In silico perturbation analysis identified 134 candidate genes whose deletion was predicted to shift cells towards a disease phenotype. These included known disease-related genes as well as many novel ones. Subsequent in vitro validation demonstrated that knockdown of the candidate genes resulted in a significant increase in the expression of SOX18.

Our novel platform, PAH-former, provides a powerful and broadly applicable strategy for disease-related gene discovery. This approach enables the identification and validation of new candidate genes from limited data, promising to advance cell-specific mechanistic insights and accelerate therapeutic development for rare diseases like PAH. (248/300 words).

## Linked entities

- **Genes:** SOX18 (SRY-box transcription factor 18) [NCBI Gene 54345]
- **Diseases:** pulmonary arterial hypertension (MONDO:0015924)

## Full-text entities

- **Genes:** SPARC (secreted protein acidic and cysteine rich) [NCBI Gene 6678] {aka BM-40, OI17, ON, ONT}, VEGFA (vascular endothelial growth factor A) [NCBI Gene 7422] {aka L-VEGF, MVCD1, VEGF, VPF}, S100A6 (S100 calcium binding protein A6) [NCBI Gene 6277] {aka 2A9, 5B10, CABP, CACY, PRA, S10A6}, SOD2 (superoxide dismutase 2) [NCBI Gene 6648] {aka GC1, GClnc1, IPO-B, IPOB, MNSOD, MVCD6}, ICAM1 (intercellular adhesion molecule 1) [NCBI Gene 3383] {aka BB2, CD54, P3.58}, RGMA (repulsive guidance molecule BMP co-receptor a) [NCBI Gene 56963] {aka RGM}, TXNIP (thioredoxin interacting protein) [NCBI Gene 10628] {aka ARRDC6, EST01027, HHCPA78, THIF, VDUP1}, NFKB1 (nuclear factor kappa B subunit 1) [NCBI Gene 4790] {aka CVID12, EBP-1, KBF1, NF-kB, NF-kB1, NF-kappa-B1}, GUCY1A1 (guanylate cyclase 1 soluble subunit alpha 1) [NCBI Gene 2982] {aka GC-S-alpha-1, GC-SA3, GCS-alpha-3, GUC1A3, GUCA3, GUCSA3}, TNF (tumor necrosis factor) [NCBI Gene 7124] {aka DIF, IMD127, TNF-alpha, TNFA, TNFSF2, TNLG1F}, PAH (phenylalanine hydroxylase) [NCBI Gene 5053] {aka PH, PKU, PKU1}, SOX18 (SRY-box transcription factor 18) [NCBI Gene 54345] {aka HLTRS, HLTS}, MT2A (metallothionein 2A) [NCBI Gene 4502] {aka MT-2, MT-II, MT2}, KDR (kinase insert domain receptor) [NCBI Gene 3791] {aka CD309, FLK1, VEGFR, VEGFR2}, NOS3 (nitric oxide synthase 3) [NCBI Gene 4846] {aka EC-NOS, ECNOS, MYMY8, NOSIII, cNOS, eNOS}, MTA2 (metastasis associated 1 family member 2) [NCBI Gene 9219] {aka MTA1L1, PID}, HSP90AA1 (heat shock protein 90 alpha family class A member 1) [NCBI Gene 3320] {aka EL52, HEL-S-65p, HSP86, HSP89A, HSP90A, HSP90N}, EPAS1 (endothelial PAS domain protein 1) [NCBI Gene 2034] {aka ECYT4, HIF2A, HLF, MOP2, PASD2, bHLHe73}, PTGIR (prostaglandin I2 receptor) [NCBI Gene 5739] {aka IP, PRIPR}, HMGB2 (high mobility group box 2) [NCBI Gene 3148] {aka HMG2}, CACYBP (calcyclin binding protein) [NCBI Gene 27101] {aka GIG5, PNAS-107, S100A6BP, SIP}, BMP1 (bone morphogenetic protein 1) [NCBI Gene 649] {aka OI13, PCOLC, PCP, TLD}, JUNB (JunB proto-oncogene, AP-1 transcription factor subunit) [NCBI Gene 3726] {aka AP-1}, GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [NCBI Gene 2597] {aka G3PD, GAPD, HEL-S-162eP}, EDNRA (endothelin receptor type A) [NCBI Gene 1909] {aka ET-A, ETA, ETA-R, ETAR, ETRA, MFDA}, PPARG (peroxisome proliferator activated receptor gamma) [NCBI Gene 5468] {aka CIMT1, FPLD3, GLM1, NR1C3, PPARG1, PPARG2}, MS4A1 (membrane spanning 4-domains A1) [NCBI Gene 931] {aka B1, Bp35, CD20, CVID5, FMC7, LEU-16}, ITGB1 (integrin subunit beta 1) [NCBI Gene 3688] {aka CD29, FNRB, GPIIA, MDF2, MSK12, VLA-BETA}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}, GUCY1B1 (guanylate cyclase 1 soluble subunit beta 1) [NCBI Gene 2983] {aka GC-S-beta-1, GC-SB3, GUC1B3, GUCB3, GUCSB3, GUCY1B3}, VCAM1 (vascular cell adhesion molecule 1) [NCBI Gene 7412] {aka CD106, INCAM-100}, EDNRB (endothelin receptor type B) [NCBI Gene 1910] {aka ABCDS, ET-B, ET-BR, ETB, ETB1, ETBR}, IL6 (interleukin 6) [NCBI Gene 3569] {aka BSF-2, BSF2, CDF, HGF, HSF, IFN-beta-2}, PDE5A (phosphodiesterase 5A) [NCBI Gene 8654] {aka CGB-PDE, CN5A, PDE5}, NAMPT (nicotinamide phosphoribosyltransferase) [NCBI Gene 10135] {aka 1110035O14Rik, PBEF, PBEF1, VF, VISFATIN}, EFNB2 (ephrin B2) [NCBI Gene 1948] {aka EPLG5, HTKL, Htk-L, LERK5, ephrin-B2}, TXN (thioredoxin) [NCBI Gene 7295] {aka TRDX, TRX, TRX1, TXN1, Trx80}
- **Diseases:** hypoxia (MESH:D000860), cardiopulmonary diseases (MESH:D006323), cancer (MESH:D009369), pulmonary disorders (MESH:D008171), inflammation (MESH:D007249), pulmonary hypertension (MESH:D006976), endothelial (MESH:D005642), hypertrophic (MESH:D002312), right heart failure (MESH:D006333), IPAH (MESH:D065627), PAH (MESH:D000081029), death (MESH:D003643), rare diseases (MESH:D035583)
- **Chemicals:** prostacyclin (MESH:D011464), CO2 (MESH:D002245), Ca2 + (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** HLCA — Homo sapiens (Human), Finite cell line (CVCL_2492), S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232), HPAECs — Homo sapiens (Human), Finite cell line (CVCL_3716)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12965534/full.md

---
Source: https://tomesphere.com/paper/PMC12965534