Redundancy-aware unsupervised rankings for collections of gene sets

Chiara Balestra; Carlo Maj; Emmanuel M\"uller; Andreas Mayr

arXiv:2307.16182·q-bio.QM·August 1, 2023

Redundancy-aware unsupervised rankings for collections of gene sets

Chiara Balestra, Carlo Maj, Emmanuel M\"uller, Andreas Mayr

PDF

Open Access

TL;DR

This paper introduces a novel redundancy-aware ranking method for gene set collections using importance scores based on Shapley values, aiming to improve interpretability and reduce redundancy while maintaining gene coverage.

Contribution

It proposes a new importance scoring approach using Shapley values that accounts for redundancy and complexity in gene set collections, enhancing interpretability.

Findings

01

Reduces redundancy in gene set collections while maintaining coverage.

02

Improves interpretability of gene set collections in bioinformatics.

03

Demonstrates practical utility in Gene Sets Enrichment Analysis.

Abstract

The biological roles of gene sets are used to group them into collections. These collections are often characterized by being high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation and study of their content. Bioinformatics looked for solutions to reduce their dimension or increase their intepretability. One possibility lies in aggregating overlapping gene sets to create larger pathways, but the modified biological pathways are hardly biologically justifiable. We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective. The proposed Shapley values-based scores consider the distribution of the singletons and the size of the sets in the families; Furthermore, a trick allows us to circumvent the usual exponential complexity of Shapley values' computation. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Machine Learning in Bioinformatics