# IlocA: An algorithm to Cluster Cells and form Imputation Groups from a   pair of Classification Variables

**Authors:** Geraard Keogh

arXiv: 2302.11916 · 2023-02-24

## TL;DR

IlocA is a novel, model-free clustering algorithm that aggregates small frequency cells based on log odds ratios to improve imputation of missing continuous data, maintaining dependence and homogeneity.

## Contribution

It introduces a bottom-up, dependence-preserving clustering method for cells in a two-way classification, enhancing imputation accuracy for missing data.

## Key findings

- IlocA effectively groups independent cells in simulations.
- The method produces near-optimal imputation cell counts.
- Imputed means are accurate under ignorable and non-ignorable missingness.

## Abstract

We set out the novel bottom up procedure to aggregate or cluster cells with small frequency counts together, in a two way classification while maintaining dependence in the table. The procedure is model free. It combines cells in a table into clusters based on independent log odds ratios. We use this procedure to build a set of statistically efficient and robust imputation cells, for the imputation of missing values of a continuous variable using a pair classification variables. A nice feature of the procedure is it forms aggregation groups homogeneous with respect to the cell response mean. Using a series of simulation studies, we show IlocA only groups together independent cells and does so in a consistent and credible way. While imputing missing data, we show IlocAs generates close to an optimal number of imputation cells. For ignorable non-response the resulting imputed means are accurate in general. With non-ignorable missingness results are consistent with those obtained elsewhere. We close with a case study applying our method to imputing missing building energy performance data

---
Source: https://tomesphere.com/paper/2302.11916