# Goistrat: gene-of-interest-based sample stratification for the evaluation of functional differences

**Authors:** Carlos Uziel Pérez Malla, Jessica Kalla, Andreas Tiefenbacher, Gabriel Wasinger, Kilian Kluge, Gerda Egger, Raheleh Sheibani-Tezerji

PMC · DOI: 10.1186/s12859-025-06109-0 · BMC Bioinformatics · 2025-04-05

## TL;DR

This paper introduces a new method to analyze cancer data by focusing on specific genes to better understand their role in disease and treatment.

## Contribution

A novel workflow using gene sets and machine learning to stratify cancer samples based on gene expression for functional analysis.

## Key findings

- The workflow outperformed existing methods in separating cancer samples for biological analysis.
- Application to prostate cancer data revealed pathways linked to tumor aggressiveness via FOLH1 gene expression.
- The approach identifies disease-relevant functions of genes in large datasets.

## Abstract

Understanding the impact of gene expression in pathological processes, such as carcinogenesis, is crucial for understanding the biology of cancer and advancing personalised medicine. Yet, current methods lack biologically-informed-omics approaches to stratify cancer patients effectively, limiting our ability to dissect the underlying molecular mechanisms.

To address this gap, we present a novel workflow for the stratification and further analysis of multi-omics samples with matched RNA-Seq data that relies on MSigDB curated gene sets, graph machine learning and ensemble clustering. We compared the performance of our workflow in the top 8 TCGA datasets and showed its clear superiority in separating samples for the study of biological differences. We also applied our workflow to analyse nearly a thousand prostate cancer samples, focusing on the varying expression of the FOLH1 gene, and identified specific pathways such as the PI3K-AKT-mTOR gene sets as well as signatures linked to prostate tumour aggressiveness.

Our comprehensive approach provides a novel tool to identify disease-relevant functions of genes of interest (GOI) in large datasets. This integrated approach offers a valuable framework for understanding the role of the expression variation of a GOI in complex diseases and for informing on targeted therapeutic strategies.

## Linked entities

- **Genes:** FOLH1 (folate hydrolase 1) [NCBI Gene 2346]
- **Diseases:** cancer (MONDO:0004992), prostate cancer (MONDO:0005159)

## Full-text entities

- **Genes:** FOLH1 (folate hydrolase 1) [NCBI Gene 2346] {aka FGCP, FOLH, GCP2, GCPII, NAALAD1, PSM}, MTOR (mechanistic target of rapamycin kinase) [NCBI Gene 2475] {aka FRAP, FRAP1, FRAP2, RAFT1, RAPT1, SKS}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}
- **Diseases:** cancer (MESH:D009369), prostate cancer (MESH:D011471), carcinogenesis (MESH:D063646)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11971790/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11971790/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC11971790/full.md

---
Source: https://tomesphere.com/paper/PMC11971790