# Bayesian outcome-guided multi-view mixture models with applications in   molecular precision medicine

**Authors:** Paul D. W. Kirk, Filippo Pagani, Sylvia Richardson

arXiv: 2303.00318 · 2023-03-02

## TL;DR

This paper introduces a multi-view Bayesian mixture model that identifies multiple clustering structures in high-dimensional 'omics data, guided by clinical outcomes, to improve disease subtype discovery in molecular medicine.

## Contribution

It proposes a novel semi-supervised, outcome-guided multi-view Bayesian clustering method that captures distinct biological processes and enhances stratified medicine applications.

## Key findings

- Effective in simulated data demonstrating multiple clustering structures.
- Applied to pan-cancer proteomics, revealing biologically meaningful subtypes.
-  Successfully integrated multi-omics data for breast cancer subtyping.

## Abstract

Clustering is commonly performed as an initial analysis step for uncovering structure in 'omics datasets, e.g. to discover molecular subtypes of disease. The high-throughput, high-dimensional nature of these datasets means that they provide information on a diverse array of different biomolecular processes and pathways. Different groups of variables (e.g. genes or proteins) will be implicated in different biomolecular processes, and hence undertaking analyses that are limited to identifying just a single clustering partition of the whole dataset is therefore liable to conflate the multiple clustering structures that may arise from these distinct processes. To address this, we propose a multi-view Bayesian mixture model that identifies groups of variables (``views"), each of which defines a distinct clustering structure. We consider applications in stratified medicine, for which our principal goal is to identify clusters of patients that define distinct, clinically actionable disease subtypes. We adopt the semi-supervised, outcome-guided mixture modelling approach of Bayesian profile regression that makes use of a response variable in order to guide inference toward the clusterings that are most relevant in a stratified medicine context. We present the model, together with illustrative simulation examples, and examples from pan-cancer proteomics. We demonstrate how the approach can be used to perform integrative clustering, and consider an example in which different 'omics datasets are integrated in the context of breast cancer subtyping.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2303.00318/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/2303.00318/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/2303.00318/full.md

---
Source: https://tomesphere.com/paper/2303.00318