Information-theoretic evaluation of covariate distributions models
Niklas Hartung, Aleksandra Khatova

TL;DR
This paper evaluates covariate distribution models using an information-theoretic measure, demonstrating the advantages of non-Gaussian models like copulas and MICE in life science data, and introduces a new confidence interval construction for KL divergence.
Contribution
It provides a systematic comparison of covariate distribution models with a novel method for confidence interval estimation of KL divergence, highlighting their strengths and limitations.
Findings
Non-Gaussian models outperform Gaussian in KL-D across datasets.
Copula models generalize well to new data, MICE tends to overfit.
Parametric copulas and MICE scale better with data dimension.
Abstract
Statistical modelling of covariate distributions allows to generate virtual populations or to impute missing values in a covariate dataset. Covariate distributions typically have non-Gaussian margins and show nonlinear correlation structures, which simple multivariate Gaussian distributions fail to represent. Prominent non-Gaussian frameworks for covariate distribution modelling are copula-based models and models based on multiple imputation by chained equations (MICE). While both frameworks have already found applications in the life sciences, a systematic investigation of their goodness-of-fit to the theoretical underlying distribution, indicating strengths and weaknesses under different conditions, is still lacking. To bridge this gap, we thoroughly evaluated covariate distribution models in terms of Kullback-Leibler divergence (KL-D), a scale-invariant information-theoretic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
