# Integrating machine learning with SHAP to uncover multi-tissue molecular signatures in Osteoarthritis progression

**Authors:** Jifeng Zhao, Jiasheng Tao, Yizhe Song, Jiyong Yang, Xiaodong Lin, Zhilong Ye, Chao Lu, Mingzhu Zeng, Weijian Chen, Wengang Liu

PMC · DOI: 10.1371/journal.pone.0343226 · PLOS One · 2026-03-09

## TL;DR

This study uses machine learning and SHAP to identify tissue-specific biomarkers for osteoarthritis, revealing key genes and immune changes in cartilage, synovium, and blood.

## Contribution

A novel ML-based framework combining SHAP for interpretable biomarker discovery in multi-tissue osteoarthritis progression.

## Key findings

- Identified 8, 28, and 61 differentially expressed genes in cartilage, synovium, and blood, respectively.
- SHAP analysis revealed top predictive genes like CSN1S1, ABCA6 in cartilage and SCRG1, CXCL2 in synovium.
- Immune infiltration analysis showed mast cell and CD8+ T cell changes in cartilage and dendritic cell shifts in synovium.

## Abstract

Osteoarthritis (OA) is a chronic joint disorder characterized by pain, reduced mobility, and structural degeneration. Despite its complex etiology and multi-tissue involvement, the molecular mechanisms underlying OA remain poorly understood. This study aimed to identify tissue-specific diagnostic biomarkers using an integrative framework combining multiple machine learning (ML) algorithms and SHapley Additive exPlanations (SHAP). Gene expression profiles from cartilage, synovium, and peripheral blood were retrieved from the GEO database. DEGs were identified across tissues, followed by feature selection using Least Absolute Shrinkage and Selection Operator(LASSO), Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Random Forest(RF). Functional enrichment, gene set variation analysis (GSVA), and immune infiltration analyses were conducted. 10 ML models were constructed to evaluate diagnostic performance. A total of 8, 28, and 61 DEGs were identified in cartilage, synovium, and blood, respectively. Enrichment analysis revealed the key roles in inflammatory signaling, metabolism, and immune pathways. Biomarkers identified included CSN1S1, ABCA6, RARRES1, NPTX2 (cartilage); SCRG1, CXCL2, PTGDS, CCL19, BGN, KLF9 (synovium); and GNL3L, C6orf111, NT5C3, ZNF148 (blood). Immune analysis indicated shifts in mast cells and CD8 + T cells in cartilage and dendritic cells in synovium, while no significant immune alterations were found in blood. Diagnostic models demonstrated strong performance, with AUCs of 0.839 (cartilage), 0.934 (synovium), and 0.892 (blood). SHAP analysis was employed to interpret each model by quantifying the contribution of individual genes to predict outcomes. In the optimal cartilage model, CSN1S1 and ABCA6 were the most influential features, with mean absolute SHAP values of 0.146 and 0.122, respectively. For synovium, SCRG1 (0.111) and CXCL2 (0.097) were top contributors, while in blood, GNL3L (0.148) and C6orf111 (0.143) showed the highest predictive importance. These results underscore the interpretability of the models and validate the functional relevance of selected biomarkers. Collectively, this study provides a robust ML-based framework for identifying and interpreting reliable OA biomarkers across multiple tissues, offering valuable insights into disease mechanisms and supporting the development of diagnostic tools.

## Linked entities

- **Genes:** CSN1S1 (casein alpha s1) [NCBI Gene 1446], ABCA6 (ATP binding cassette subfamily A member 6) [NCBI Gene 23460], RARRES1 (retinoic acid receptor responder 1) [NCBI Gene 5918], NPTX2 (neuronal pentraxin 2) [NCBI Gene 4885], SCRG1 (stimulator of chondrogenesis 1) [NCBI Gene 11341], CXCL2 (C-X-C motif chemokine ligand 2) [NCBI Gene 2920], PTGDS (prostaglandin D2 synthase) [NCBI Gene 5730], CCL19 (C-C motif chemokine ligand 19) [NCBI Gene 6363], BGN (biglycan) [NCBI Gene 633], KLF9 (KLF transcription factor 9) [NCBI Gene 687], GNL3L (G protein nucleolar 3 like) [NCBI Gene 54552], PNISR (PNN interacting serine and arginine rich protein) [NCBI Gene 25957], NT5C3A (5'-nucleotidase, cytosolic IIIA) [NCBI Gene 51251], ZNF148 (zinc finger protein 148) [NCBI Gene 7707]
- **Diseases:** Osteoarthritis (MONDO:0005178)

## Full-text entities

- **Genes:** APOD (apolipoprotein D) [NCBI Gene 347], ADRB2 (adrenoceptor beta 2) [NCBI Gene 154] {aka ADRB2R, ADRBR, ARB2, B2AR, BAR, BETA2AR}, ABCA6 (ATP binding cassette subfamily A member 6) [NCBI Gene 23460] {aka EST155051}, REN (renin) [NCBI Gene 5972] {aka ADTKD4, HNFJ2, RTD}, PNISR (PNN interacting serine and arginine rich protein) [NCBI Gene 25957] {aka C6orf111, HSPC306, SFRS18, SRrp130, bA98I9.2}, MAFF (MAF bZIP transcription factor F) [NCBI Gene 23764] {aka U-MAF, hMafF}, HSPA1B (heat shock protein family A (Hsp70) member 1B) [NCBI Gene 3304] {aka HSP70-1, HSP70-1B, HSP70-2, HSP70.1, HSP70.2, HSP72}, HK3 (hexokinase 3) [NCBI Gene 3101] {aka HKIII, HXK3}, ID3 (inhibitor of DNA binding 3) [NCBI Gene 3399] {aka HEIR-1, bHLHb25}, MMP1 (matrix metallopeptidase 1) [NCBI Gene 4312] {aka CLG}, NFKB1 (nuclear factor kappa B subunit 1) [NCBI Gene 4790] {aka CVID12, EBP-1, KBF1, NF-kB, NF-kB1, NF-kappa-B1}, ZNF148 (zinc finger protein 148) [NCBI Gene 7707] {aka BERF-1, BFCOL1, GDACCF, HT-BETA, ZBP-89, ZFP148}, ZBTB16 (zinc finger and BTB domain containing 16) [NCBI Gene 7704] {aka PLZF, ZNF145}, MXRA5 (matrix remodeling associated 5) [NCBI Gene 25878], GNL3L (G protein nucleolar 3 like) [NCBI Gene 54552] {aka GNL3B}, ANGPTL7 (angiopoietin like 7) [NCBI Gene 10218] {aka AngX, CDT6, dJ647M16.1}, NPTX2 (neuronal pentraxin 2) [NCBI Gene 4885] {aka NARP, NP-II, NP2}, PTGDS (prostaglandin D2 synthase) [NCBI Gene 5730] {aka L-PGDS, LPGDS, PDS, PGD2, PGDS, PGDS2}, IL1B (interleukin 1 beta) [NCBI Gene 3553] {aka IL-1, IL1-BETA, IL1F2, IL1beta}, RARRES1 (retinoic acid receptor responder 1) [NCBI Gene 5918] {aka LXNL, PERG-1, TIG1}, SERPINF1 (serpin family F member 1) [NCBI Gene 5176] {aka EPC-1, OI12, OI6, PEDF, PIG35}, RPS4Y1 (ribosomal protein S4 Y-linked 1) [NCBI Gene 6192] {aka RPS4Y, S4}, EGR1 (early growth response 1) [NCBI Gene 1958] {aka AT225, G0S30, KROX-24, NGFI-A, TIS8, ZIF-268}, FBP1 (fructose-bisphosphatase 1) [NCBI Gene 2203] {aka FBP}, SCRG1 (stimulator of chondrogenesis 1) [NCBI Gene 11341] {aka SCRG-1, lincSCRG1}, JCHAIN (joining chain of multimeric IgA and IgM) [NCBI Gene 3512] {aka IGCJ, IGJ, JCH}, MMP3 (matrix metallopeptidase 3) [NCBI Gene 4314] {aka CHDS6, MMP-3, SL-1, STMY, STMY1, STR1}, IL22 (interleukin 22) [NCBI Gene 50616] {aka IL-21, IL-22, IL-D110, IL-TIF, ILTIF, TIFIL-23}, CCR7 (C-C motif chemokine receptor 7) [NCBI Gene 1236] {aka BLR2, CC-CKR-7, CCR-7, CD197, CDw197, CMKBR7}, GPR18 (G protein-coupled receptor 18) [NCBI Gene 2841] {aka DRV2}, CCL19 (C-C motif chemokine ligand 19) [NCBI Gene 6363] {aka CKb11, ELC, MIP-3b, MIP3B, SCYA19}, ANOS1 (anosmin 1) [NCBI Gene 3730] {aka ADMLX, HH1, HHA, KAL, KAL1, KALIG-1}, CX3CR1 (C-X3-C motif chemokine receptor 1) [NCBI Gene 1524] {aka CCRL1, CMKBRL1, CMKDR1, GPR13, GPRV28, V28}, NT5C3A (5'-nucleotidase, cytosolic IIIA) [NCBI Gene 51251] {aka CNSHA8, NT5C3, P5'N-1, P5N-1, PN-I, POMP}, CXCL2 (C-X-C motif chemokine ligand 2) [NCBI Gene 2920] {aka CINC-2a, GRO2, GROb, MGSA-b, MIP-2a, MIP2}, NELL1 (neural EGFL like 1) [NCBI Gene 4745] {aka IDH3GL}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, FAM43A (family with sequence similarity 43 member A) [NCBI Gene 131583], ZNF486 (zinc finger protein 486) [NCBI Gene 90649] {aka KRBO2}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, ITGB2 (integrin subunit beta 2) [NCBI Gene 3689] {aka CD18, LAD, LCAMB, LFA-1, MAC-1, MF17}, FKBP5 (FKBP prolyl isomerase 5) [NCBI Gene 2289] {aka AIG6, FKBP51, FKBP54, P54, PPIase, Ptg-10}, IFI16 (interferon gamma inducible protein 16) [NCBI Gene 3428] {aka IFNGIP1, PYHIN2}, ADH1C (alcohol dehydrogenase 1C (class I), gamma polypeptide) [NCBI Gene 126] {aka ADH3}, NFIL3 (nuclear factor, interleukin 3 regulated) [NCBI Gene 4783] {aka E4BP4, IL3BP1, NF-IL3A, NFIL3A}, DCN (decorin) [NCBI Gene 1634] {aka CSCD, DSPG2, PG40, PGII, PGS2, SLRR1B}, BGN (biglycan) [NCBI Gene 633] {aka DSPG1, MRLS, PG-S1, PGI, SEMDX, SLRR1A}, TGFB1 (transforming growth factor beta 1) [NCBI Gene 7040] {aka CAEND1, CED, DPD1, IBDIMDE, LAP, TGF-beta1}, KLF9 (KLF transcription factor 9) [NCBI Gene 687] {aka BTEB, BTEB1}, CYP4B1 (cytochrome P450 family 4 subfamily B member 1) [NCBI Gene 1580] {aka CYPIVB1, P-450HP}, ATF3 (activating transcription factor 3) [NCBI Gene 467], CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, LRRC15 (leucine rich repeat containing 15) [NCBI Gene 131578] {aka LIB}, MTOR (mechanistic target of rapamycin kinase) [NCBI Gene 2475] {aka FRAP, FRAP1, FRAP2, RAFT1, RAPT1, SKS}, BDKRB1 (bradykinin receptor B1) [NCBI Gene 623] {aka B1BKR, B1R, BKB1R, BKR1, BRADYB1}, STMN2 (stathmin 2) [NCBI Gene 11075] {aka SCG10, SCGN10}, ANGPTL2 (angiopoietin like 2) [NCBI Gene 23452] {aka ARP2, HARP}, CSN1S1 (casein alpha s1) [NCBI Gene 1446] {aka CASA, CSN1}, IL17A (interleukin 17A) [NCBI Gene 3605] {aka CTLA-8, CTLA8, IL-17, IL-17A, IL17, ILA17}
- **Diseases:** cartilage degeneration (MESH:D002357), knee joint dysfunction (MESH:D000092443), metabolic dysfunction (MESH:D008659), OA (MESH:D010003), acute myeloid leukemia (MESH:D015470), obesity (MESH:D009765), structural degeneration (MESH:D020914), postoperative pain (MESH:D010149), psychiatric disorders (MESH:D001523), Alzheimer's disease (MESH:D000544), pain (MESH:D010146), cardiometabolic comorbidities (MESH:D024821), inflammation (MESH:D007249), degenerative articular disease (MESH:D019636), Disease (MESH:D004194), disability (MESH:D009069), functional impairment (MESH:D003072), hemolytic anemia (MESH:D000743), chronic pain (MESH:D059350), Measles (MESH:D008457), joint disease (MESH:D007592), synovitis (MESH:D013585), gastrointestinal complications (MESH:D005767), hypertrophy (MESH:D006984), Atherosclerosis (MESH:D050197), pulmonary arterial hypertension (MESH:D000081029), colorectal cancer (MESH:D015179), loss of mobility (MESH:D014086), articular stiffness (MESH:C566112), Legionellosis (MESH:D007876), rheumatoid arthritis (MESH:D001172), inflammatory arthritides (MESH:D001168)
- **Chemicals:** galactose (MESH:D005690), cholesterol (MESH:D002784), branched-chain amino acids (MESH:D000597), terpene (MESH:D013729), arachidonic acid (MESH:D016718), valine (MESH:D014633), nicotinamide (MESH:D009536), pentose phosphate (MESH:D010428), nitrogen (MESH:D009584), chondroitin sulfate glycosaminoglycan (MESH:C038142), glycosphingolipid (MESH:D006028), glycan (MESH:D011134), glycerophospholipid (MESH:D020404), acid (MESH:D000143), reactive oxygen species (MESH:D017382), glucose (MESH:D005947), tryptophan (MESH:D014364), lysine (MESH:D008239), sucrose (MESH:D013395), lipid (MESH:D008055), ATP (MESH:D000255), starch (MESH:D013213), phenylalanine (MESH:D010649), niacin (MESH:D009525), unsaturated fatty acids (MESH:D005231), glycosaminoglycan (MESH:D006025), heparan sulfate (MESH:D006497), H2O2 (MESH:D006861), O-glycan (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** (AUC) of 0, p.F149del

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12970860/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12970860/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12970860/full.md

---
Source: https://tomesphere.com/paper/PMC12970860