Model Selection for Causal Modeling in Missing Exposure Problems
Yuliang Shi, Yeying Zhu, Joel A. Dubin

TL;DR
This paper investigates model selection strategies for causal inference with missing exposure data, proposing a new criterion called 'rank score' to optimize model choice and improve causal effect estimation accuracy.
Contribution
It introduces a novel 'rank score' criterion for selecting imputation and propensity score models in causal inference with missing at random data.
Findings
Full imputation plus outcome-related PS models minimize RMSE.
Rank score effectively identifies the best models.
Application demonstrates causal effect estimation in COVID-19 mortality.
Abstract
In causal inference, properly selecting the propensity score (PS) model is an important topic and has been widely investigated in observational studies. There is also a large literature focusing on the missing data problem. However, there are very few studies investigating the model selection issue for causal inference when the exposure is missing at random (MAR). In this paper, we discuss how to select both imputation and PS models, which can result in the smallest root mean squared error (RMSE) of the estimated causal effect in our simulation study. Then, we propose a new criterion, called ``rank score'' for evaluating the overall performance of both models. The simulation studies show that the full imputation plus the outcome-related PS models lead to the smallest RMSE and the rank score can help select the best models. An application study is conducted to quantify the causal effect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Bayesian Modeling and Causal Inference
