A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random

Binh H. Ho; Long Nguyen Chi; TrungTin Nguyen; Binh T. Nguyen; Van Ha Hoang; Christopher Drovandi

arXiv:2505.19093·stat.ME·November 5, 2025

A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random

Binh H. Ho, Long Nguyen Chi, TrungTin Nguyen, Binh T. Nguyen, Van Ha Hoang, Christopher Drovandi

PDF

Open Access 1 Video

TL;DR

This paper presents a unified framework for variable selection in model-based clustering that effectively handles missing not at random data, improving the identification of meaningful subgroups in complex datasets.

Contribution

It introduces a novel, integrated approach combining data-driven penalties and explicit missingness modeling, enhancing clustering accuracy and variable selection in challenging data scenarios.

Findings

01

Achieves asymptotic and selection consistency under regularity conditions.

02

Demonstrates improved clustering performance on synthetic and real transcriptomic data.

03

Enhances flexibility and efficiency of model-based clustering with missing data.

Abstract

Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have been proposed to address these problems, they typically tackle each issue in isolation, thereby limiting their flexibility and adaptability. This paper introduces a unified framework designed to address these challenges simultaneously. Our approach incorporates a data-driven penalty matrix into penalized clustering to enable more flexible variable selection, along with a mechanism that explicitly models the relationship between missingness and latent class membership. We demonstrate that, under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random· slideslive

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Data Management and Algorithms