The out-of-source error in multi-source cross validation-type procedures
Georgios Afendras, Marianthi Markatou

TL;DR
This paper introduces the out-of-source error in multi-source cross validation, providing an unbiased estimator, analyzing its variance, and establishing conditions for consistency, supported by simulation results.
Contribution
It defines the out-of-source error in multi-source data, proposes an unbiased estimator, and proves its consistency under broad conditions.
Findings
Unbiased estimator of out-of-source error derived.
Estimator's variance analyzed and discussed.
Simulation study confirms theoretical results.
Abstract
A scientific phenomenon under study may often be manifested by data arising from processes, i.e. sources, that may describe this phenomenon. In this contex of multi-source data, we define the "out-of-source" error, that is the error committed when a new observation of unknown source origin is allocated to one of the sources using a rule that is trained on the known labeled data. We present an unbiased estimator of this error, and discuss its variance. We derive natural and easily verifiable assumptions under which the consistency of our estimator is guaranteed for a broad class of loss functions and data distributions. Finally, we evaluate our theoretical results via a simulation study.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
