IlocA: An algorithm to Cluster Cells and form Imputation Groups from a pair of Classification Variables
Geraard Keogh

TL;DR
IlocA is a novel, model-free clustering algorithm that aggregates small frequency cells based on log odds ratios to improve imputation of missing continuous data, maintaining dependence and homogeneity.
Contribution
It introduces a bottom-up, dependence-preserving clustering method for cells in a two-way classification, enhancing imputation accuracy for missing data.
Findings
IlocA effectively groups independent cells in simulations.
The method produces near-optimal imputation cell counts.
Imputed means are accurate under ignorable and non-ignorable missingness.
Abstract
We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Modeling Techniques · Advanced Clustering Algorithms Research · Neural Networks and Applications
