Model Specification Test with Unlabeled Data: Approach from Covariate Shift
Masahiro Kato, Hikaru Kawarazaki

TL;DR
This paper introduces a new model specification test that utilizes unlabeled data to assess model correctness under distribution shifts, enhancing robustness and interpretability.
Contribution
It extends the definition of correct model specification to any distribution of explanatory variables and proposes a test leveraging unlabeled data for robustness against covariate shift.
Findings
Effective in synthetic datasets
Works well on real-world data
Improves robustness to distribution shifts
Abstract
We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models · Fault Detection and Control Systems
MethodsTest · Interpretability
