HABiC: an algorithm based on the exact computation of the Kantorovich-Rubinstein optimizer for binary classification in transcriptomics
Chiara Cordier, Pascal Jézéquel, Mario Campone, Fabien Panloup, Agnes Basseville

TL;DR
This paper introduces HABiC, a new machine learning algorithm that improves precision in transcriptomics data analysis using the Wasserstein distance and Kantorovich-Rubinstein optimizer.
Contribution
The novel contribution is a binary classification algorithm based on exact computation of the Kantorovich-Rubinstein optimizer for transcriptomics data.
Findings
HABiC outperformed state-of-the-art algorithms on synthetic datasets with complex variable relationships.
The algorithm achieved higher accuracy in predicting clinical outcomes from transcriptomics data.
Exact and approximate Wasserstein-based methods showed better performance than Euclidean distance classifiers.
Abstract
Machine learning analyses of molecular omics datasets largely drive the development of precision medicine in oncology, but mathematical challenges still hamper their application in the clinic. In particular, omics-based learning relies on high dimensional data with high degrees of freedom and multicollinearity issues, requiring more tailored algorithms. Here, we have developed a prediction algorithm that relies on the 1-Wasserstein distance to better capture complex relationships between variables, and that is built on a decision rule based on the exact computation of the Kantorovich-Rubinstein optimizer to increase the algorithm precision. We explored dimension reduction and aggregation methods to improve its robustness. The exact method was compared with a neural network-based approximate method, as well as with standard Euclidean distance-based classifiers. Experimental results on…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
