Separating and reintegrating latent variables to improve classification of genomic data
Nora Yujia Payne, Johann A. Gagnon-Bartsch

TL;DR
This paper introduces the cross-residualization classifier, a novel method that estimates and adjusts for latent variables in genomic data, improving classification accuracy by effectively separating and reintegrating these hidden effects.
Contribution
The paper presents a new classifier that accounts for latent variables in genomic data by estimating, residualizing, and reintegrating them, enhancing classification performance.
Findings
Performs well on simulated data and real genomic datasets.
Offers substantial gains over existing classifiers.
Effectively separates and reintegrates latent variables.
Abstract
Genomic datasets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes) and thus give rise to dense latent variation, which presents both challenges and opportunities for classification. Some of these latent variables may be partially correlated with the phenotype of interest and therefore helpful, while others may be uncorrelated and thus merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. We propose the cross-residualization classifier to better account for the latent variables in genomic data. Through an adjustment and ensemble procedure, the cross-residualization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Genomics and Phylogenetic Studies
