Redundancy-aware unsupervised rankings for collections of gene sets
Chiara Balestra, Carlo Maj, Emmanuel M\"uller, Andreas Mayr

TL;DR
This paper introduces a novel redundancy-aware ranking method for gene set collections using importance scores based on Shapley values, aiming to improve interpretability and reduce redundancy while maintaining gene coverage.
Contribution
It proposes a new importance scoring approach using Shapley values that accounts for redundancy and complexity in gene set collections, enhancing interpretability.
Findings
Reduces redundancy in gene set collections while maintaining coverage.
Improves interpretability of gene set collections in bioinformatics.
Demonstrates practical utility in Gene Sets Enrichment Analysis.
Abstract
The biological roles of gene sets are used to group them into collections. These collections are often characterized by being high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation and study of their content. Bioinformatics looked for solutions to reduce their dimension or increase their intepretability. One possibility lies in aggregating overlapping gene sets to create larger pathways, but the modified biological pathways are hardly biologically justifiable. We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective. The proposed Shapley values-based scores consider the distribution of the singletons and the size of the sets in the families; Furthermore, a trick allows us to circumvent the usual exponential complexity of Shapley values' computation. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Machine Learning in Bioinformatics
