# Protocol for assessing distances in pathway space for classifier feature sets from machine learning methods

**Authors:** Bahar Tercan, Victor H. Apolonio, Vinicius S. Chagas, Christopher K. Wong, Jordan A. Lee, Christina Yau, Christopher C. Benz, Joshua M. Stuart, Brian J. Karlberg, Kyle Ellrott, Jasleen K. Grewal, Steven J.M. Jones, Theo A. Knijnenberg, Theo A. Knijnenberg, Mauro A.A. Castro, Vinicius S. Chagas, Victor H. Apolonio, Verena Friedl, Joshua M. Stuart, Vladislav Uzunangelov, Christopher K. Wong, Rameen Beroukhim, Andrew D. Cherniack, Galen F. Gao, Gad Getz, Stephanie H. Hoyt, Xavier Loinaz, Whijae Roh, Chip Stewart, Lindsay Westlake, Christopher C. Benz, Jasleen K. Grewal, Steven J.M. Jones, A. Gordon Robertson, Samantha J. Caesar-Johnson, John A. Demchok, Ina Felau, Anab Kemal, Roy Tarnuzzer, Peggy I. Wang, Zhining Wang, Liming Yang, Jean C. Zenklusen, Rehan Akbani, Bradley M. Broom, Zhenlin Ju, Andre Schultz, Akinyemi I. Ojesina, Katherine A. Hoadley, Avantika Lal, Daniele Ramazzotti, Chen Wang, Alexander J. Lazar, Lewis R. Roberts, Bahar Tercan, Taek-Kyun Kim, Ilya Shmulevich, Paulos Charonyktakis, Vincenzo Lagani, Ioannis Tsamardinos, Esther Drill, Ronglai Shen, Martin L. Ferguson, Kami E. Chiotti, Kyle Ellrott, Brian J. Karlberg, Jordan A. Lee, Eve Lowenstein, Paul T. Spellman, Adam Struck, Christina Yau, D. Neil Hayes, Toshinori Hinoue, Hui Shen, Peter W. Laird, Jean C. Zenklusen, A. Gordon Robertson, Peter W. Laird, Andrew D. Cherniack, Mauro A.A. Castro

PMC · DOI: 10.1016/j.xpro.2025.103681 · STAR Protocols · 2025-03-18

## TL;DR

This paper introduces a protocol to assess biological relationships between gene sets identified by different machine learning methods using pathway space analysis.

## Contribution

A novel protocol is introduced to evaluate the biological relevance of distinct gene sets in pathway space using the PathwaySpace R package.

## Key findings

- The protocol enables testing if seemingly different gene sets are biologically related in pathway space.
- Steps are provided for building a pathway space and calculating pathway distances between gene sets.
- Density plots and pathway distance metrics help visualize and quantify relationships between gene sets.

## Abstract

As genes tend to be co-regulated as gene modules, feature selection in machine learning (ML) on gene expression data can be challenged by the complexity of gene regulation. Here, we present a protocol for reconciling differences in classifier features identified using different ML approaches. We describe steps for loading the PathwaySpace R package, preparing input for analysis, and creating density plots of gene sets. We then detail procedures for testing whether apparently distinct feature sets are related in pathway space.

For complete details on the use and execution of this protocol, please refer to Ellrott et al.1

•Protocol to test whether distinct gene sets reflect related biology•Steps for building a pathway space for graph- or network-based distance analysis•Instructions for calculating a pathway distance metric for any pair of gene sets•Guidance on exploring relationships between gene sets in pathway space

Protocol to test whether distinct gene sets reflect related biology

Steps for building a pathway space for graph- or network-based distance analysis

Instructions for calculating a pathway distance metric for any pair of gene sets

Guidance on exploring relationships between gene sets in pathway space

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

As genes tend to be co-regulated as gene modules, feature selection in machine learning (ML) on gene expression data can be challenged by the complexity of gene regulation. Here, we present a protocol for reconciling differences in classifier features identified using different ML approaches. We describe steps for loading the PathwaySpace R package, preparing input for analysis, and creating density plots of gene sets. We then detail procedures for testing whether apparently distinct feature sets are related in pathway space.

## Full-text entities

- **Genes:** TRIM28 (tripartite motif containing 28) [NCBI Gene 10155] {aka KAP1, PPP1R157, RNF96, TF1B, TIF1B, TIF1beta}, PIK3CA (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha) [NCBI Gene 5290] {aka CCM4, CLAPO, CLOVE, CWS5, HMH, MCAP}, CTNNB1 (catenin beta 1) [NCBI Gene 1499] {aka CTNNB, EVR7, MRD19, NEDSDV, armadillo}, GNB1 (G protein subunit beta 1) [NCBI Gene 2782] {aka HG2A, MDS, MRD42}, ACTB (actin beta) [NCBI Gene 60] {aka BKRNS, BNS, BRWS1, CSMH, DDS1, PS1TP5BP1}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, A1BG (alpha-1-B glycoprotein) [NCBI Gene 1] {aka A1B, ABG, GAB, HYST2477}, RPS27A (ribosomal protein S27a) [NCBI Gene 6233] {aka CEP80, HEL112, S27A, UBA80, UBCEP1, UBCEP80}, TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, GRB2 (growth factor receptor bound protein 2) [NCBI Gene 2885] {aka ASH, EGFRBP-GRB2, Grb3-3, MST084, MSTP084, NCKAP2}, CRISP3 (cysteine rich secretory protein 3) [NCBI Gene 10321] {aka Aeg2, CRISP-3, CRS3, SGP28, dJ442L6.3}, PIK3R1 (phosphoinositide-3-kinase regulatory subunit 1) [NCBI Gene 5295] {aka AGM7, GRB1, IMD36, p85, p85-ALPHA, p85alpha}
- **Diseases:** TRUE (MESH:C565693), Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11968257/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11968257/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC11968257/full.md

---
Source: https://tomesphere.com/paper/PMC11968257