# Enrichment on steps, not genes, improves inference of differentially expressed pathways

**Authors:** Nicholas Markarian, Kimberly M. Van Auken, Dustin Ebert, Paul W. Sternberg

PMC · DOI: 10.1371/journal.pcbi.1011968 · 2024-03-25

## TL;DR

This paper improves pathway analysis by focusing on steps rather than individual genes, revealing more accurate insights into differentially expressed pathways.

## Contribution

The novel approach treats sets of interchangeable genes as single entities to improve pathway enrichment analysis.

## Key findings

- Treating gene sets as single entities increases sensitivity to pathways with OR logic.
- The method recovers pathways missed by traditional gene list enrichment analysis.
- Results show significant proportions of new pathways in medically relevant datasets.

## Abstract

Enrichment analysis is frequently used in combination with differential expression data to investigate potential commonalities amongst lists of genes and generate hypotheses for further experiments. However, current enrichment analysis approaches on pathways ignore the functional relationships between genes in a pathway, particularly OR logic that occurs when a set of proteins can each individually perform the same step in a pathway. As a result, these approaches miss pathways with large or multiple sets because of an inflation of pathway size (when measured as the total gene count) relative to the number of steps. We address this problem by enriching on step-enabling entities in pathways. We treat sets of protein-coding genes as single entities, and we also weight sets to account for the number of genes in them using the multivariate Fisher’s noncentral hypergeometric distribution. We then show three examples of pathways that are recovered with this method and find that the results have significant proportions of pathways not found in gene list enrichment analysis.

Genome-scale experiments typically identify sets of genes which are primarily analyzed by enrichment analysis to identify relevant pathways that may be perturbed. Curated pathway models have rich structure that we believe can be exploited to get better results. Some pathway steps are enabled by sets of interchangeable genes which inflate the gene count of their respective pathways relative to the number of steps. We improve sensitivity towards these pathways in enrichment analysis by performing enrichment on steps. We then use this approach to identify pathways that would otherwise be missed in medically relevant datasets to gain new insights.

## Full-text entities

- **Genes:** ADAMTS14 (ADAM metallopeptidase with thrombospondin type 1 motif 14) [NCBI Gene 140766], Cfd (complement factor D) [NCBI Gene 11537] {aka Adn, DF}, LMNA (lamin A/C) [NCBI Gene 4000] {aka CDCD1, CDDC, CMD1A, CMT2B1, EMD2, FPL}, IER3 (immediate early response 3) [NCBI Gene 8870] {aka DIF-2, DIF2, GLY96, IEX-1, IEX-1L, IEX1}, C1ra (complement component 1, r subcomponent A) [NCBI Gene 50909] {aka C1r, mC1rA}, PTPA (protein phosphatase 2 phosphatase activator) [NCBI Gene 5524] {aka PARK25, PP2A, PPP2R4, PR53}, P4HA3 (prolyl 4-hydroxylase subunit alpha 3) [NCBI Gene 283208], MELTF (melanotransferrin) [NCBI Gene 4241] {aka CD228, MAP97, MFI2, MTF1, MTf}, COL1A1 (collagen type I alpha 1 chain) [NCBI Gene 1277] {aka CAFYD, EDSARTH1, EDSC, OI1, OI2, OI3}, HK2 (hexokinase 2) [NCBI Gene 3099] {aka HKII, HXK2}, Serping1 (serine (or cysteine) peptidase inhibitor, clade G, member 1) [NCBI Gene 12258] {aka C1 Inh, C1INH., C1Inh, C1nh}, GCK (glucokinase) [NCBI Gene 2645] {aka FGQTL3, GK, GLK, HHF3, HK4, HKIV}, Elane (elastase, neutrophil expressed) [NCBI Gene 50701] {aka Ela2, F430011M15Rik, NE}, Masp1 (MBL associated serine protease 1) [NCBI Gene 17174] {aka CCPII, Crarf, Masp1/3}, PIK3CD (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit delta) [NCBI Gene 5293] {aka APDS, IMD14, IMD14A, IMD14B, P110DELTA, PI3K}, F2 (coagulation factor II, thrombin) [NCBI Gene 2147] {aka PT, RPRGL2, THPH1}, P4HA1 (prolyl 4-hydroxylase subunit alpha 1) [NCBI Gene 5033] {aka P4HA}, BMP1 (bone morphogenetic protein 1) [NCBI Gene 649] {aka OI13, PCOLC, PCP, TLD}, Cfh (complement component factor h) [NCBI Gene 12628] {aka Mud-1, NOM, Sas-1, Sas1}, P4HA2 (prolyl 4-hydroxylase subunit alpha 2) [NCBI Gene 8974] {aka MYP25, lncRNA-PE}, IL7R (interleukin 7 receptor) [NCBI Gene 3575] {aka CD127, CDW127, IL-7R-alpha, IL-7Ralpha, IL7RA, IL7Ralpha}, HK3 (hexokinase 3) [NCBI Gene 3101] {aka HKIII, HXK3}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}, HK1 (hexokinase 1) [NCBI Gene 3098] {aka CNSHA5, HK, HK1-ta, HK1-tb, HK1-tc, HKD}
- **Diseases:** myocardial infarction (MESH:D009203), RV heart failure (MESH:D006333), cardiac remodeling (MESH:D020257), Cancer (MESH:D009369), LV (MESH:D018487), Dilated and arrhythmogenic cardiomyopathy (MESH:D002311), COVID-19 (MESH:D000086382), CAM (MESH:D020786), nonalcoholic steatohepatitis (MESH:D065626), pulmonary hypertension (MESH:D006976), aortic constriction (MESH:D015877), platelet aggregation (MESH:D001791), hypertrophy (MESH:D006984), cardiovascular and cardiac diseases (MESH:D002318), PCOS (MESH:D011085), gastric cancers (MESH:D013274), LV failure (MESH:D051437), thrombotic (MESH:D013927)
- **Chemicals:** phosphate (MESH:D010710), hydroxyproline (MESH:D006909), DMSO (MESH:D004121), Pc (MESH:C053518), Pb (MESH:D007854), Triton X-100 (MESH:D017830), glucose (MESH:D005947), Pd (MESH:D010165), Pa (MESH:D011478), fatty acids (MESH:D005227), PIP2 (MESH:D019269), PI5P (-), VLCFA (MESH:C017364), lipids (MESH:D008055), Fatty Acyl Co-A (MESH:D000214)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10994554/full.md

---
Source: https://tomesphere.com/paper/PMC10994554