Clustering and Prediction with Variable Dimension Covariates
Garritt L. Page, Fernando A. Quintana, Peter M\"uller

TL;DR
This paper introduces a covariate-dependent partition model that effectively handles missing data in covariates without imputation, improving prediction accuracy in various applications.
Contribution
It presents a novel method for prediction with incomplete covariates that works seamlessly across data types and outperforms existing approaches in simulations.
Findings
Method handles missing covariates without imputation.
Performs well in simulations and real applications.
Supports both in-sample and out-of-sample predictions.
Abstract
In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we develop allows in-sample predictions as well as out-of-sample prediction, even if the missing pattern in the new subjects' incomplete covariate vector was not seen in the training data. Any data type, including categorical or continuous covariates are permitted. In simulation studies the proposed method compares favorably. We illustrate the method in two application examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
