Clustering and Prediction with Variable Dimension Covariates

Garritt L. Page; Fernando A. Quintana; Peter M\"uller

arXiv:1912.13119·stat.ME·July 14, 2020·J. Comput. Graph. Stat.

Clustering and Prediction with Variable Dimension Covariates

Garritt L. Page, Fernando A. Quintana, Peter M\"uller

PDF

TL;DR

This paper introduces a covariate-dependent partition model that effectively handles missing data in covariates without imputation, improving prediction accuracy in various applications.

Contribution

It presents a novel method for prediction with incomplete covariates that works seamlessly across data types and outperforms existing approaches in simulations.

Findings

01

Method handles missing covariates without imputation.

02

Performs well in simulations and real applications.

03

Supports both in-sample and out-of-sample predictions.

Abstract

In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we develop allows in-sample predictions as well as out-of-sample prediction, even if the missing pattern in the new subjects' incomplete covariate vector was not seen in the training data. Any data type, including categorical or continuous covariates are permitted. In simulation studies the proposed method compares favorably. We illustrate the method in two application examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.