A Bayesian Model for Co-clustering Ordinal Data with Informative Missing Entries
Alice Giampino, Antonio Canale, Bernardo Nipoti

TL;DR
This paper introduces a Bayesian nonparametric co-clustering model for multivariate ordinal data that treats missing and censored entries as informative, improving data structure understanding.
Contribution
It presents a novel Bayesian model leveraging Dirichlet processes and latent variables to handle informative missingness and high-dimensional ordinal data.
Findings
Model outperforms existing methods in simulations
Effectively identifies subpopulations in real data
Handles high-dimensional data efficiently
Abstract
Several approaches have been proposed in the literature for clustering multivariate ordinal data. These methods typically treat missing values as absent information, rather than recognizing them as valuable for profiling population characteristics. To address this gap, we introduce a Bayesian nonparametric model for co-clustering multivariate ordinal data that treats censored observations as informative, rather than merely missing. We demonstrate that this offers a significant improvement in understanding the underlying structure of the data. Our model exploits the flexibility of two independent Dirichlet processes, allowing us to infer potentially distinct subpopulations that characterize the latent structure of both subjects and variables. The ordinal nature of the data is addressed by introducing latent variables, while a matrix factorization specification is adopted to handle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Data Management and Algorithms
