Optirank: classification for RNA-Seq data with optimal ranking reference genes
Paola Malsot (1), Filipe Martins (1), Didier Trono (1), Guillaume, Obozinski (1, 2, 3) ((1) Ecole Polytechnique F\'ed\'erale de Lausanne, (2), Swiss Data Science Center, (3) ETH Z\"urich)

TL;DR
Optirank introduces a robust logistic regression model that learns optimal reference genes for ranking in RNA-Seq classification, improving generalization across datasets with distribution shifts.
Contribution
The paper proposes a novel method, optirank, which learns reference genes for ranking, enhancing robustness and sparsity in RNA-Seq classification tasks.
Findings
Optirank performs at least as well as classical rank-based logistic regression.
It produces sparser solutions, aiding interpretability.
Multi-source learning further improves robustness against dataset shifts.
Abstract
Classification algorithms using RNA-Sequencing (RNA-Seq) data as input are used in a variety of biological applications. By nature, RNA-Seq data is subject to uncontrolled fluctuations both within and especially across datasets, which presents a major difficulty for a trained classifier to generalize to an external dataset. Replacing raw gene counts with the rank of gene counts inside an observation has proven effective to mitigate this problem. However, the rank of a feature is by definition relative to all other features, including highly variable features that introduce noise in the ranking. To address this problem and obtain more robust ranks, we propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking. We show the effectiveness of this method on simulated data. We also consider real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
MethodsTest · Logistic Regression
