Model Specification Test with Unlabeled Data: Approach from Covariate   Shift

Masahiro Kato; Hikaru Kawarazaki

arXiv:1911.00688·stat.ME·February 25, 2020

Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Masahiro Kato, Hikaru Kawarazaki

PDF

Open Access

TL;DR

This paper introduces a new model specification test that utilizes unlabeled data to assess model correctness under distribution shifts, enhancing robustness and interpretability.

Contribution

It extends the definition of correct model specification to any distribution of explanatory variables and proposes a test leveraging unlabeled data for robustness against covariate shift.

Findings

01

Effective in synthetic datasets

02

Works well on real-world data

03

Improves robustness to distribution shifts

Abstract

We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Statistical Methods and Models · Fault Detection and Control Systems

MethodsTest · Interpretability