Learning from missing data with the Latent Block Model
Gabriel Frisch (Heudiasyc), Jean-Benoist L\'eger (Heudiasyc), Yves, Grandvalet (Heudiasyc)

TL;DR
This paper introduces a co-clustering model based on the Latent Block Model to leverage informative missing data, specifically MNAR, improving analysis of complex datasets like voting records.
Contribution
The paper develops a novel co-clustering approach for MNAR data using the Latent Block Model with a variational EM algorithm and model selection, addressing a gap in handling nonignorable missingness.
Findings
Effectively identifies meaningful groups in simulated data.
Reveals relevant MP and text clusters in French Parliament voting data.
Provides interpretable insights into non-voter behavior.
Abstract
Missing data can be informative. Ignoring this information can lead to misleading conclusions when the data model does not allow information to be extracted from the missing data. We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of this nonignorable nonresponses, also known as Missing Not At Random data (MNAR). A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented. We assess the proposed approach on a simulation study, before using our model on the voting records from the lower house of the French Parliament, where our analysis brings out relevant groups of MPs and texts, together with a sensible interpretation of the behavior of non-voters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Computational and Text Analysis Methods · Bayesian Modeling and Causal Inference
