
TL;DR
This paper introduces a new bootstrap method for large, complex data with crossed and unbalanced random effects, improving stability assessment without fitting complex models.
Contribution
It proposes a row-column resampling bootstrap that achieves mean consistency in heteroscedastic crossed random effects models, avoiding the need for model fitting.
Findings
The bootstrap method is mean consistent under heteroscedastic crossed effects.
Naive bootstrap sampling can be misleading for such data.
The method is applicable to large bipartite graphs and recommender systems.
Abstract
Recently there has been much interest in data that, in statistical language, may be described as having a large crossed and severely unbalanced random effects structure. Such data sets arise for recommender engines and information retrieval problems. Many large bipartite weighted graphs have this structure too. We would like to assess the stability of algorithms fit to such data. Even for linear statistics, a naive form of bootstrap sampling can be seriously misleading and McCullagh [Bernoulli 6 (2000) 285--301] has shown that no bootstrap method is exact. We show that an alternative bootstrap separately resampling rows and columns of the data matrix satisfies a mean consistency property even in heteroscedastic crossed unbalanced random effects models. This alternative does not require the user to fit a crossed random effects model to the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
