Factor Models for Cancer Signatures

Zura Kakushadze; Willie Yu

arXiv:1604.08743·q-bio.GN·January 24, 2017

Factor Models for Cancer Signatures

Zura Kakushadze, Willie Yu

PDF

Open Access

TL;DR

This paper introduces a new statistical method adapted from finance risk models to extract more stable and computationally efficient cancer signatures from genome data, discovering three novel signatures.

Contribution

It applies risk modeling techniques to cancer genomics, enabling faster and more reliable extraction of mutational signatures with the identification of new cancer-specific signatures.

Findings

01

Lower variability in signatures from filtered data

02

Tenfold reduction in computational cost

03

Discovery of three novel cancer signatures

Abstract

We present a novel method for extracting cancer signatures by applying statistical risk models (http://ssrn.com/abstract=2732453) from quantitative finance to cancer genome data. Using 1389 whole genome sequenced samples from 14 cancers, we identify an "overall" mode of somatic mutational noise. We give a prescription for factoring out this noise and source code for fixing the number of signatures. We apply nonnegative matrix factorization (NMF) to genome data aggregated by cancer subtype and filtered using our method. The resultant signatures have substantially lower variability than those from unfiltered data. Also, the computational cost of signature extraction is cut by about a factor of 10. We find 3 novel cancer signatures, including a liver cancer dominant signature (96% contribution) and a renal cell carcinoma signature (70% contribution). Our method accelerates finding new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Associations and Epidemiology · Gene expression and cancer classification · Cancer Genomics and Diagnostics