IlocA: An algorithm to Cluster Cells and form Imputation Groups from a   pair of Classification Variables

Geraard Keogh

arXiv:2302.11916·stat.ME·February 24, 2023

IlocA: An algorithm to Cluster Cells and form Imputation Groups from a pair of Classification Variables

Geraard Keogh

PDF

Open Access

TL;DR

IlocA is a novel, model-free clustering algorithm that aggregates small frequency cells based on log odds ratios to improve imputation of missing continuous data, maintaining dependence and homogeneity.

Contribution

It introduces a bottom-up, dependence-preserving clustering method for cells in a two-way classification, enhancing imputation accuracy for missing data.

Findings

01

IlocA effectively groups independent cells in simulations.

02

The method produces near-optimal imputation cell counts.

03

Imputed means are accurate under ignorable and non-ignorable missingness.

Abstract

We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Modeling Techniques · Advanced Clustering Algorithms Research · Neural Networks and Applications