Is graph-based feature selection of genes better than random?
Mohammad Hashir, Paul Bertin, Martin Weiss, Vincent Frappier, Theodore, J. Perkins, Genevi\`eve Boucher, Joseph Paul Cohen

TL;DR
This study evaluates whether biologically derived gene interaction graphs improve gene expression predictions over random graphs, finding that random graphs perform nearly as well, indicating widespread information distribution among genes.
Contribution
The paper introduces a method to assess the quality of gene interaction graphs for gene expression prediction using a Single Gene Inference task, comparing biological and random graphs.
Findings
Dependencies are captured almost as well by random graphs.
Gene expression information is widely distributed across many genes.
Biologically relevant graphs may not provide significant advantages over random graphs.
Abstract
Gene interaction graphs aim to capture various relationships between genes and represent decades of biology research. When trying to make predictions from genomic data, those graphs could be used to overcome the curse of dimensionality by making machine learning models sparser and more consistent with biological common knowledge. In this work, we focus on assessing whether those graphs capture dependencies seen in gene expression data better than random. We formulate a condition that graphs should satisfy to provide a good prior knowledge and propose to test it using a `Single Gene Inference' (SGI) task. We compare random graphs with seven major gene interaction graphs published by different research groups, aiming to measure the true benefit of using biologically relevant graphs in this context. Our analysis finds that dependencies can be captured almost as well at random which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Gene Regulatory Network Analysis
MethodsTest
