Sparse group factor analysis for biclustering of multiple data sources
Kerstin Bunte, Eemeli Lepp\"aaho, Inka Saarinen, Samuel Kaski

TL;DR
This paper introduces a Bayesian biclustering method for multiple data sources, enabling detection of linear structures and patterns across heterogeneous genomic datasets, with demonstrated accuracy and interpretability.
Contribution
It extends Group Factor Analysis to include biclustering with sparsity, allowing joint analysis of multiple data types for better pattern discovery.
Findings
Reliable inference of bi-clusters from heterogeneous data
High prediction accuracy on drug sensitivity data
Provides biologically interpretable biclusters
Abstract
Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
