# An information criterion for auxiliary variable selection in incomplete   data analysis

**Authors:** Shinpei Imori, Hidetoshi Shimodaira

arXiv: 1902.07954 · 2019-03-27

## TL;DR

This paper introduces an information criterion designed to select relevant auxiliary variables in incomplete data analysis, improving primary variable estimation by effectively leveraging related auxiliary data.

## Contribution

It proposes a novel information criterion that is asymptotically unbiased for model selection in incomplete data settings, linking it to leave-one-out cross validation.

## Key findings

- The criterion effectively identifies relevant auxiliary variables in simulations.
- It improves estimation accuracy for primary variables when auxiliary variables are relevant.
- Performance demonstrated on real data example.

## Abstract

Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback-Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.07954/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1902.07954/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1902.07954/full.md

---
Source: https://tomesphere.com/paper/1902.07954