Looking for (Genomic) Needles in a Haystack: Sparsity-Driven Search for Identifying Correlated Genetic Mutations in Cancer
Ritvik Prabhu, Emil Vatai, Bernard Moussad, Emmanuel Jeannot, Ramu Anandakrishnan, Wu-chun Feng, Mohamed Wahib

TL;DR
This paper introduces P-DFS, a sparsity-exploiting algorithm that significantly accelerates the identification of multi-gene mutation combinations in cancer, enabling feasible analysis of higher-order hits.
Contribution
The paper presents P-DFS, a novel pruning algorithm leveraging data sparsity to efficiently search for correlated genetic mutations in cancer, reducing computational complexity dramatically.
Findings
Achieves 90-98% reduction in search space for 4-hit combinations.
Provides approximately 183x speedup over exhaustive search methods.
Handles higher-order gene hits efficiently on high-performance clusters.
Abstract
Cancer typically arises not from a single genetic mutation (i.e., hit) but from multi-hit combinations that accumulate within cells. However, enumerating multi-hit combinations becomes exponentially more expensive computationally as the number of candidate hit gene combinations grow, i.e. on the order of 20,000 choose h, where 20,000 is the number of genes in the human genome and h is the number of hits. To address this challenge, we present an algorithmic framework, called Pruned Depth-First Search (P-DFS) that leverages the high sparsity in tumor mutation data to prune large portions of the search space. Specifically, P-DFS (the main contribution of this paper) - a pruning technique that exploits sparsity to drastically reduce the otherwise exponential h-hit search space for candidate combinations used by Weighted Set Cover - which is grounded in a depth-first search backtracking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenome Rearrangement Algorithms · Cancer Genomics and Diagnostics · Bioinformatics and Genomic Networks
