A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Yueting Wang; Shu Wang; Jonathan G. Yabes; Chung-Chou H. Chang

arXiv:2603.29316·stat.AP·April 23, 2026

A Bayesian Finite Mixture Model Approach for Mixed-type Data Clustering and Variable Selection with Censored Biomarkers

Yueting Wang, Shu Wang, Jonathan G. Yabes, Chung-Chou H. Chang

PDF

TL;DR

This paper introduces a Bayesian finite mixture model that effectively clusters mixed-type biomedical data, handles censored biomarkers, and identifies important variables, improving subgroup discovery in heterogeneous patient populations.

Contribution

The proposed BFMM framework is novel in modeling mixed data types, incorporating dependency structures, handling censored data, and quantifying variable importance within a unified Bayesian approach.

Findings

01

BFMM outperforms existing methods in clustering accuracy.

02

It reliably distinguishes informative features from noise.

03

Applied to real datasets, BFMM identified meaningful clinical phenotypes.

Abstract

Clustering mixed-type data remains a major challenge in biomedical research to uncover clinically meaningful subgroups within heterogeneous patient populations. Most existing clustering methods impose restrictive assumptions like local independence, fail to accommodate censored biomarkers, or unable to quantify variable importance. We propose a Bayesian finite mixture model (BFMM) clustering framework that addresses these limitations. BFMM flexibly models both continuous and categorical variables, incorporates three covariance structures to capture cluster-specific dependencies among continuous features, and handles censored observations through likelihood-based imputation. To facilitate feature prioritization, BFMM uses spike-and-slab priors to estimate variable importance on a continuous 0-1 scale. Simulation studies demonstrate that BFMM outperforms existing methods in clustering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.