Information-theoretic evaluation of covariate distributions models

Niklas Hartung; Aleksandra Khatova

arXiv:2406.10611·stat.AP·March 20, 2025

Information-theoretic evaluation of covariate distributions models

Niklas Hartung, Aleksandra Khatova

PDF

Open Access

TL;DR

This paper evaluates covariate distribution models using an information-theoretic measure, demonstrating the advantages of non-Gaussian models like copulas and MICE in life science data, and introduces a new confidence interval construction for KL divergence.

Contribution

It provides a systematic comparison of covariate distribution models with a novel method for confidence interval estimation of KL divergence, highlighting their strengths and limitations.

Findings

01

Non-Gaussian models outperform Gaussian in KL-D across datasets.

02

Copula models generalize well to new data, MICE tends to overfit.

03

Parametric copulas and MICE scale better with data dimension.

Abstract

Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler divergence (KL-D), a scale-invariant information-theoretic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications