Interpretable Models via Pairwise permutations algorithm

Troy Maaslandand; Jo\~ao Pereira; Diogo Bastos; Marcus de; Goffau; Max Nieuwdorp; Aeilko H. Zwinderman; Evgeni Levin

arXiv:2111.09145·cs.LG·November 18, 2021

Interpretable Models via Pairwise permutations algorithm

Troy Maaslandand, Jo\~ao Pereira, Diogo Bastos, Marcus de, Goffau, Max Nieuwdorp, Aeilko H. Zwinderman, Evgeni Levin

PDF

Open Access

TL;DR

This paper introduces the pairwise permutation algorithm (PPA), a new method designed to reduce correlation bias in feature importance assessments, improving interpretability in high-dimensional biological data analysis.

Contribution

The paper presents the theoretical foundation of PPA and demonstrates its effectiveness in correcting correlation bias through toy and microbiome datasets.

Findings

01

PPA corrects correlation effects in feature importance.

02

PPA identifies biologically relevant biomarkers.

03

PPA improves interpretability in high-dimensional data.

Abstract

One of the most common pitfalls often found in high dimensional biological data sets are correlations between the features. This may lead to statistical and machine learning methodologies overvaluing or undervaluing these correlated predictors, while the truly relevant ones are ignored. In this paper, we will define a new method called \textit{pairwise permutation algorithm} (PPA) with the aim of mitigating the correlation bias in feature importance values. Firstly, we provide a theoretical foundation, which builds upon previous work on permutation importance. PPA is then applied to a toy data set, where we demonstrate its ability to correct the correlation effect. We further test PPA on a microbiome shotgun dataset, to show that the PPA is already able to obtain biological relevant biomarkers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks