Factor Models for Cancer Signatures
Zura Kakushadze, Willie Yu

TL;DR
This paper introduces a new statistical method adapted from finance risk models to extract more stable and computationally efficient cancer signatures from genome data, discovering three novel signatures.
Contribution
It applies risk modeling techniques to cancer genomics, enabling faster and more reliable extraction of mutational signatures with the identification of new cancer-specific signatures.
Findings
Lower variability in signatures from filtered data
Tenfold reduction in computational cost
Discovery of three novel cancer signatures
Abstract
We present a novel method for extracting cancer signatures by applying statistical risk models (http://ssrn.com/abstract=2732453) from quantitative finance to cancer genome data. Using 1389 whole genome sequenced samples from 14 cancers, we identify an "overall" mode of somatic mutational noise. We give a prescription for factoring out this noise and source code for fixing the number of signatures. We apply nonnegative matrix factorization (NMF) to genome data aggregated by cancer subtype and filtered using our method. The resultant signatures have substantially lower variability than those from unfiltered data. Also, the computational cost of signature extraction is cut by about a factor of 10. We find 3 novel cancer signatures, including a liver cancer dominant signature (96% contribution) and a renal cell carcinoma signature (70% contribution). Our method accelerates finding new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Gene expression and cancer classification · Cancer Genomics and Diagnostics
