# Gradient Boosting Prediction of Overlapping Genes From Weighted Co-expression and Differential Gene Expression Analysis of Wnt Pathway: An Artificial Intelligence-Based Bioinformatics Study

**Authors:** Pradeep Kumar Yadalam, Ramya R, Raghavendra Vamsi Anegundi

PMC · DOI: 10.7759/cureus.67207 · 2024-08-19

## TL;DR

This study uses machine learning to predict overlapping genes in the Wnt signaling pathway, which is important for bone formation and cell regulation.

## Contribution

The novel approach combines gradient boosting with co-expression and differential gene analysis to predict overlapping genes in the Wnt pathway.

## Key findings

- Gradient boosting achieved 78.9% accuracy in predicting overlapping genes in the Wnt pathway.
- The model showed high precision but low recall, indicating accurate predictions but missing some true positives.
- WGCNA and differential expression analysis helped identify key gene clusters and hub genes related to Wnt signaling.

## Abstract

Introduction

The Wnt (wingless-related integration site) signalling pathway is crucial for bone formation and remodelling, regulating the commitment of mesenchymal stem cells (MSCs) to the osteoblastic lineage. It triggers the transcriptional activation of Wnt target genes and promotes osteoblast proliferation and survival. Weighted co-expression network analysis (WGCNA) and differential gene expression analysis help researchers understand gene roles. Gradient boosting, a machine learning technique, enhances understanding of genetic and molecular mechanisms contributing to overlap genes, improving gene regulation and functional genomics. The aim is to predict overlapping genes in the Wnt signalling pathway.

Methods

Differential gene expression analysis was performed using the National Center for Biotechnology Information (NCBI) geo dataset-GSE251951, focusing on the effect of Wnt signaling on treatment. The WGCNA module was analyzed using the iDEP tool to identify interconnected gene clusters. Hub genes were identified by calculating module eigengenes, correlated with external traits, and ranked based on module membership values. The study utilized gradient boosting, an ensemble learning method, to predict models, evaluate their performance using metrics like accuracy, precision, recall, and F1 score, and adjust predictions based on gradient and learning rate.

Results

The dendrogram uses the "Dynamic TreeCut" algorithm to analyze gene clusters, aiding researchers in understanding gene modules and biological processes, identifying co-expressed genes, and discovering new pathways. The confusion matrix displays 88 actual and predicted cases. The gradient boosting model achieves 78.9% accuracy in predicting Wnt pathway overlapping genes, with a respectable area under the curve (AUC) and classification accuracy values. It accurately predicts 73.9% of samples, with a high precision ratio and low recall.

Conclusion

Future research should enhance differential expression analysis and WGCNA to identify key Wnt pathway genes, improve sensitivity, specificity, hyperparameter tuning, and validation experiments, and use larger datasets.

## Full-text entities

- **Genes:** Gpc3 (glypican 3) [NCBI Gene 14734] {aka OCI-5}, LRP5 (LDL receptor related protein 5) [NCBI Gene 4041] {aka BMND1, EVR1, EVR4, HBM, LR3, LRP-5}, Aadat (aminoadipate aminotransferase) [NCBI Gene 23923] {aka Aadt, KATII, Kat2, Kyat2, mKat-2}, TrnS1 (tRNA-Ser) [NCBI Gene 17742], Nat2 (N-acetyltransferase 2 (arylamine N-acetyltransferase)) [NCBI Gene 17961], CTNNB1 (catenin beta 1) [NCBI Gene 1499] {aka CTNNB, EVR7, MRD19, NEDSDV, armadillo}, BMP1 (bone morphogenetic protein 1) [NCBI Gene 649] {aka OI13, PCOLC, PCP, TLD}, Mir361 (microRNA 361) [NCBI Gene 723850] {aka Mirn361, mir-361, mmu-mir-361}, TRNV (tRNA-Val) [NCBI Gene 4577] {aka MTTV}, Masp1 (MBL associated serine protease 1) [NCBI Gene 17174] {aka CCPII, Crarf, Masp1/3}, Wnt3a (wingless-type MMTV integration site family, member 3A) [NCBI Gene 22416] {aka Wnt-3a, vt}, Mir152 (microRNA 152) [NCBI Gene 387170] {aka Mirn152, mir-152, mmu-mir-152}, TNFRSF11B (TNF receptor superfamily member 11b) [NCBI Gene 4982] {aka OCIF, OPG, PDB5, TR1}, LRP6 (LDL receptor related protein 6) [NCBI Gene 4040] {aka ADCAD2, EVR8, OPTA4, STHAG7}, Apof (apolipoprotein F) [NCBI Gene 103161] {aka LVIF}, SOST (sclerostin) [NCBI Gene 50964] {aka CDD, DAND6, SOST1, VBCH}
- **Diseases:** OPPG (MESH:D010024), blindness (MESH:D001766), low bone mineral density (MESH:D001851), osteoarthritis (MESH:D010003), low BMD (MESH:D020388), limb abnormalities (MESH:D001259), neonatal death (MESH:D066087), fracture (MESH:D050723), sepsis (MESH:D018805), severe acute respiratory distress syndrome (MESH:D045169), ARDS (MESH:D012128), bone mass disorders (MESH:D001847)
- **Chemicals:** DGE (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11410066/full.md

---
Source: https://tomesphere.com/paper/PMC11410066