Topical Hidden Genome: Discovering Latent Cancer Mutational Topics using a Bayesian Multilevel Context-learning Approach
Saptarshi Chakraborty, Zoe Guan, Colin B. Begg, and Ronglai Shen

TL;DR
This paper introduces a Bayesian multilevel approach combined with topic modeling to interpret ultra-rare cancer genome mutations, enabling scalable, rigorous inference and revealing new biological insights.
Contribution
It presents a novel framework that integrates topic models with a hierarchical cancer mutation model for interpretable, scalable Bayesian inference on large genomic datasets.
Findings
Identified novel mutational topics associated with specific cancer types
Demonstrated scalable Bayesian inference on large-scale genomic data
Provided new biological insights into cancer mutational processes
Abstract
Statistical inference on the cancer-site specificities of collective ultra-rare whole genome somatic mutations is an open problem. Traditional statistical methods cannot handle whole-genome mutation data due to their ultra-high-dimensionality and extreme data sparsity -- e.g., >30 million unique variants are observed in the ~1700 whole-genome tumor dataset considered herein, of which >99% variants are encountered only once. To harness information in these rare variants we have recently proposed the "hidden genome model", a formal multilevel multi-logistic model that mines information in ultra-rare somatic variants to characterize tumor types. The model condenses signals in rare variants through a hierarchical layer leveraging contexts of individual mutations. The model is currently implemented using consistent, scalable point estimation techniques that can handle 10s of millions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Gene expression and cancer classification · Genetic Associations and Epidemiology
