Handling highly correlated genes in prediction analysis of genomic   studies

Li Xing; Songwan Joun; Kurt Mackay; Mary Lesperance; and Xuekui Zhang

arXiv:2007.02455·stat.AP·April 11, 2022·1 cites

Handling highly correlated genes in prediction analysis of genomic studies

Li Xing, Songwan Joun, Kurt Mackay, Mary Lesperance, and Xuekui Zhang

PDF

Open Access

TL;DR

This paper introduces a grouping algorithm for highly correlated genes in genomic prediction models, improving robustness and interpretability by representing gene groups and maintaining biological signals.

Contribution

The novel grouping algorithm effectively handles correlated genes, enhancing prediction accuracy and biomarker discovery in genomic studies.

Findings

01

Significantly outperforms standard models in phenotype prediction

02

Improves robustness of feature selection under condition changes

03

Identifies gene groups as potential biomarkers

Abstract

Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction model is still not well addressed. High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. Furthermore, when a causal gene (whose variants have an actual biological effect on a phenotype) is highly correlated with other genes, most algorithms select the feature gene from the correlated group in a purely data-driven manner. Since the correlation structure among genes could change substantially when condition changes, the prediction model based on not correctly selected feature genes is unreliable. Therefore, we aim to keep the causal biological signal in the prediction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Single-cell and spatial transcriptomics