Network-based Distance Metric with Application to Discover Disease Subtypes in Cancer
Jipeng Qiang, Wei Ding, John Quackenbush, Ping Chen

TL;DR
This paper introduces a novel network-based distance metric for clustering sparse, high-dimensional gene mutational data to improve cancer subtype discovery, outperforming existing methods and identifying previously undetectable subtypes.
Contribution
A new network-based distance metric tailored for sparse mutational data enhances cancer subtype detection beyond current clustering algorithms.
Findings
Outperforms top competitors in synthetic data tests.
Detects novel cancer subtypes in real data.
Effective with extremely sparse mutational profiles.
Abstract
While we once thought of cancer as single monolithic diseases affecting a specific organ site, we now understand that there are many subtypes of cancer defined by unique patterns of gene mutations. These gene mutational data, which can be more reliably obtained than gene expression data, help to determine how the subtypes develop, evolve, and respond to therapies. Different from dense continuous-value gene expression data, which most existing cancer subtype discovery algorithms use, somatic mutational data are extremely sparse and heterogeneous, because there are less than 0.5\% mutated genes in discrete value 1/0 out of 20,000 human protein-coding genes, and identical mutated genes are rarely shared by cancer patients. Our focus is to search for cancer subtypes from extremely sparse and high dimensional gene mutational data in discrete 1 and 0 values using unsupervised learning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Genomics and Phylogenetic Studies
