On regression and classification with possibly missing response variables in the data
Majid Mojirsheibani, William Pouliot, Andre Shakhbandaryan

TL;DR
This paper develops a kernel regression and classification framework for data with possibly missing responses, addressing unknown missing mechanisms, and provides theoretical guarantees for the estimators' performance.
Contribution
It introduces a novel two-step approach for handling missing response variables in kernel methods, with theoretical analysis and performance bounds.
Findings
Derived exponential bounds on estimator deviations in Lp norms.
Established strong convergence results for the proposed estimators.
Extended the methodology to classification and other local-averaging methods.
Abstract
This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and the response variables. Our proposed approach involves two steps: In the first step, we construct a family of models (possibly infinite dimensional) indexed by the unknown parameter of the missing probability mechanism. In the second step, a search is carried out to find the empirically optimal member of an appropriate cover (or subclass) of the underlying family in the sense of minimizing the mean squared prediction error. The main focus of the paper is to look into the theoretical properties of these estimators. The issue of identifiability is also addressed. Our methods use a data-splitting approach which is quite easy to implement. We also derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
