Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization
Marta Pelizzola, Ragnhild Laursen, Asger Hobolth

TL;DR
This paper introduces a Negative Binomial non-negative matrix factorization method with a novel model selection procedure for more accurately identifying mutational signatures in cancer genomes, especially under overdispersion.
Contribution
It proposes a Negative Binomial NMF with patient-specific dispersion parameters and a new cross-validation inspired model selection method for robust mutational signature extraction.
Findings
The method outperforms classical approaches in simulations under model misspecification.
It more accurately determines the true number of signatures in overdispersed data.
Applied to real cancer datasets, it reveals meaningful mutational signatures.
Abstract
The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate. We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients. We also introduce a novel model selection procedure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genomic variations and chromosomal abnormalities · Genetic Associations and Epidemiology
