Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data
Ehsan Hajiramezanali, Siamak Zamani Dadaneh, Alireza Karbalayghareh,, Mingyuan Zhou, and Xiaoning Qian

TL;DR
This paper introduces a Bayesian multi-domain learning model that effectively integrates diverse NGS count data for accurate cancer subtype discovery, addressing overdispersion and limited sample sizes in complex diseases.
Contribution
The paper presents a novel hierarchical negative binomial factorization model for multi-domain NGS data, improving cancer subtyping accuracy with small sample sizes and reducing negative transfer.
Findings
Effective multi-domain learning demonstrated on TCGA datasets
Outperforms existing transfer learning methods in cancer subtyping
Handles overdispersed count data accurately
Abstract
Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS). However, translating NGS data faces several challenges. First, NGS count data are often overdispersed, requiring appropriate modeling. Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity. The key question is whether we may integrate available data from all different sources or domains to achieve reproducible disease prognosis based on NGS count data. In this paper, we develop a Bayesian Multi-Domain Learning (BMDL) model that derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Biomedical Text Mining and Ontologies
