Interpretable Prediction Rule Ensembles in the Presence of Missing Data
Vincent Schroeder, Jakob Schwerter, Marjolein Fokkema, Philipp Doebler

TL;DR
This paper evaluates how different imputation methods affect the performance and interpretability of Prediction Rule Ensembles when data contains missing values, proposing a data stacking approach for improved results.
Contribution
It introduces a data stacking approach for combining multiple imputed datasets in PREs and compares various imputation techniques under realistic conditions.
Findings
MIXGBoost and MICE PMM achieve high rule recovery but increase false positives.
MICE RF and missRanger promote rule sparsity and simplicity.
Avoiding overly coarse variable rounding reduces model size with minimal performance loss.
Abstract
Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to fill in missing values, but uses a data stacking approach instead of a traditional model pooling approach to combine the results. We perform a simulation study to compare imputation methods under realistic conditions, focusing on sample sizes of and across 1,000 replications. Evaluated techniques include multiple imputation by chained equations with predictive mean matching (MICE PMM), MICE with Random Forest (MICE RF), Random Forest imputation with the ranger algorithm (missRanger), and imputation using extreme gradient boosting (MIXGBoost), with results compared to listwise deletion. Because stacking multiple imputed datasets can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Neural Networks and Applications · Hydrological Forecasting Using AI
