Robust prediction under missingness shifts
Patrick Rockenschaub, Zhicong Xian, Alireza Zamanian, Marta Piperno,, Octavia-Andreea Ciora, Elisabeth Pachl, Narges Ahmidi

TL;DR
This paper investigates how missing data and shifts in missingness mechanisms affect predictive modeling, emphasizing the robustness of the Bayes predictor under certain conditions and the importance of handling missingness informatively.
Contribution
It analyzes the impact of missingness shifts on prediction, showing that the Bayes predictor remains robust under ignorable shifts and highlighting empirical findings on methods' robustness.
Findings
Bayes predictor remains unchanged under ignorable missingness shifts.
Disregarding missingness can be beneficial when missingness is highly informative.
Different methods show varying robustness to different missingness shifts.
Abstract
Prediction becomes more challenging with missing covariates. What method is chosen to handle missingness can greatly affect how models perform. In many real-world problems, the best prediction performance is achieved by models that can leverage the informative nature of a value being missing. Yet, the reasons why a covariate goes missing can change once a model is deployed in practice. If such a missingness shift occurs, the conditional probability of a value being missing differs in the target data. Prediction performance in the source data may no longer be a good selection criterion, and approaches that do not rely on informative missingness may be preferable. However, we show that the Bayes predictor remains unchanged by ignorable shifts for which the probability of missingness only depends on observed data. Any consistent estimator of the Bayes predictor may therefore result in…
Peer Reviews
Decision·Submitted to ICLR 2024
The paper is well-written and has a very clear organization. The proposed method, NeuMISE, seems to be simple but effective and outperform other baselines. The results are relatively complete and solid.
The paper uses quite a lot space to discuss the missingness shift. Although such descriptions are complete and clear, it seems to be relatively elementary and do not provide enough new intelletucal insights. Under ignorable condition, Theorem 1 "equivalence" is also straightforward and hence is not surprising, at least to me. Last few sentences in Section 5.1 confuse me. what is "adjusting Y", "omitting Y" and definition of "stable estimator"? Section 6 is rather short. It should be expanded t
1. Finding ways to cope with non-ignorable distribution shifts (shifts in the conditional distribution of Y|X) is an important and challenging problem and has not received as much attention as covariate distribution shift, so it’s great that this work points out that methods for dealing with missingness and ignorable missingness shift are insufficient in the presence of non-ignorable missingness shift. 2. The paper is clear and well-written.
1. The main weakness of this work is that it does not cite or discuss its connection to [1], which is another work that studies robustness to missingness shift. This paper describes that when missing data indicators are available, domain adaptation under missingness shift reduces to a covariate shift problem. This finding seems to be related to one of the central contributions of this paper, which is that the optimal predictor remains unchanged if missingness only depends on observables in both
* This work tackles an important but under-emphasized problem with simple but powerful theoretical results. The core contribution regarding the formalization of missingness shift and discussion of ignorable shifts is a strong contribution and has potential for broad use in applications. * The experiments are broad and cover a number of data generating processes, shift mechanisms, and comparator methods.
* The motivation for the NeuMISE method is not presented clearly enough or with enough detail to tie it to the rest of the core claims of the work. It is primarily not clear why modifying the masking of NeuMiss is well-motivated to address the issue of generalizing across unobserved missingness patterns. * I have several concerns regarding the clarity of the work, which are elaborated on in the Questions section below.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Fuzzy Systems and Optimization · Neural Networks and Applications
