# A Bayesian Finite Mixture Model with Variable Selection for Data with   Mixed-type Variables

**Authors:** Shu Wang, Jonathan G. Yabes, Chung-Chou H. Chang

arXiv: 1905.03680 · 2019-05-10

## TL;DR

This paper introduces a Bayesian finite mixture model that performs variable selection, handles censored biomarker data due to detection limits, and improves clustering robustness over traditional EM-based methods.

## Contribution

It develops a Bayesian approach with Gibbs sampling for variable importance, censored data handling, and clustering, addressing key limitations of existing mixture models.

## Key findings

- The model effectively identifies important variables in mixed-type data.
- It accurately handles censored biomarker data in clinical datasets.
- Simulation and real data analyses demonstrate improved clustering performance.

## Abstract

Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers subject to limits of detection (LOD) are common to encounter in clinical data, which brings censored variables into finite mixture model. Additionally, researchers are recently getting more interest in variable importance due to the increasing number of variables that become available for clustering.   To address these challenges, we propose a Bayesian finite mixture model to simultaneously conduct variable selection, account for biomarker LOD and obtain clustering results. We took a Bayesian approach to obtain parameter estimates and the cluster membership to bypass the limitation of the EM algorithm. To account for LOD, we added one more step in Gibbs sampling to iteratively fill in biomarker values below or above LODs. In addition, we put a spike-and-slab type of prior on each variable to obtain variable importance. Simulations across various scenarios were conducted to examine the performance of this method. Real data application on electronic health records was also conducted.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.03680/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.03680/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1905.03680/full.md

---
Source: https://tomesphere.com/paper/1905.03680