# Unveiling the hub genes in the SIGLECs family in colon adenocarcinoma with machine learning

**Authors:** Tiantian Li, Ji Yao

PMC · DOI: 10.3389/fgene.2024.1375100 · 2024-04-08

## TL;DR

This study identifies key genes in the SIGLECs family linked to colon cancer and uses machine learning to better understand their roles in cancer progression and immune response.

## Contribution

The study introduces a novel combination of machine learning techniques to uncover hub genes and subtypes in colon adenocarcinoma.

## Key findings

- SIGLEC14 significantly affects overall survival in colon adenocarcinoma patients.
- PCA improves sensitivity to survival and disease-free intervals in COAD prognosis.
- SIGLEC-1,15 and CD22 are identified as hub genes in COAD through differential expression and PCA analysis.

## Abstract

Despite the recognized roles of Sialic acid-binding Ig-like lectins (SIGLECs) in endocytosis and immune regulation across cancers, their molecular intricacies in colon adenocarcinoma (COAD) are underexplored. Meanwhile, the complicated interactions between different SIGLECs are also crucial but open questions.

We investigate the correlation between SIGLECs and various properties, including cancer status, prognosis, clinical features, functional enrichment, immune cell abundances, immune checkpoints, pathways, etc. To fully understand the behavior of multiple SIGLECs’ co-evolution and subtract its leading effect, we additionally apply three unsupervised machine learning algorithms, namely, Principal Component Analysis (PCA), Self-Organizing Maps (SOM), K-means, and two supervised learning algorithms, Least Absolute Shrinkage and Selection Operator (LASSO) and neural network (NN).

We find significantly lower expression levels in COAD samples, together with a systematic enhancement in the correlations between distinct SIGLECs. We demonstrate SIGLEC14 significantly affects the Overall Survival (OS) according to the Hazzard ratio, while using PCA further enhances the sensitivity to both OS and Disease Free Interval (DFI). We find any single SIGLEC is uncorrelated to the cancer stages, which can be significantly improved by using PCA. We further identify SIGLEC-1,15 and CD22 as hub genes in COAD through Differentially Expressed Genes (DEGs), which is consistent with our PCA-identified key components PC-1,2,5 considering both the correlation with cancer status and immune cell abundance. As an extension, we use SOM for the visualization of the SIGLECs and show the similarities and differences between COAD patients. SOM can also help us define subsamples according to the SIGLECs status, with corresponding changes in both immune cells and cancer T-stage, for instance.

We conclude SIGLEC-1,15 and CD22 as the most promising hub genes in the SIGLECs family in treating COAD. PCA offers significant enhancement in the prognosis and clinical analyses, while using SOM further unveils the transition phases or potential subtypes of COAD.

## Linked entities

- **Genes:** SIGLEC14 (sialic acid binding Ig like lectin 14) [NCBI Gene 100049587], SIGLEC1 (sialic acid binding Ig like lectin 1) [NCBI Gene 6614], SIGLEC15 (sialic acid binding Ig like lectin 15) [NCBI Gene 284266], CD22 (CD22 molecule) [NCBI Gene 933]
- **Diseases:** colon adenocarcinoma (MONDO:0002271), COAD (MONDO:0002271)

## Full-text entities

- **Genes:** SIGLEC14 (sialic acid binding Ig like lectin 14) [NCBI Gene 100049587], CD22 (CD22 molecule) [NCBI Gene 933] {aka SIGLEC-2, SIGLEC2}
- **Diseases:** COAD (MESH:D003110), cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11033367/full.md

---
Source: https://tomesphere.com/paper/PMC11033367