Clustering high dimensional mixed data to uncover sub-phenotypes:joint analysis of phenotypic and genotypic data
Damien McParland, Catherine M. Phillips, Lorraine Brennan, Helen M., Roche, Isobel Claire Gormley

TL;DR
This paper introduces a Bayesian mixture model for clustering high-dimensional mixed phenotypic and genotypic data to identify sub-phenotypes related to metabolic syndrome, demonstrating its effectiveness in uncovering meaningful groups and discriminatory variables.
Contribution
A novel latent variable Bayesian model with variable selection for joint analysis of phenotypic and genotypic data in high dimensions is proposed.
Findings
Identified two meaningful sub-phenotypes ('healthy' and 'at risk')
Discovered key phenotypic and genotypic variables for discrimination
Sub-phenotypes strongly correlate with disease classification after seven years
Abstract
The LIPGENE-SU.VI.MAX study, like many others, recorded high dimensional continuous phenotypic data and categorical genotypic data. LIPGENE-SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE-SU.VI.MAX participants into homogeneous groups or sub-phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model which elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE-SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
