A model robust sub-sampling approach for Generalised Linear Models in Big data settings
Amalan Mahendran, Helen Thompson, James M. McGree

TL;DR
This paper introduces a model robust sub-sampling method for Generalised Linear Models in Big data, which improves over existing techniques by considering multiple models to determine sampling probabilities, enhancing robustness and efficiency.
Contribution
It proposes a novel model robust sub-sampling approach that evaluates probabilities across multiple models, overcoming reliance on a single assumed model in Big data analysis.
Findings
Outperforms existing sub-sampling methods in simulations
Demonstrates improved inference accuracy in real-world applications
Provides theoretical support for the robustness of the approach
Abstract
In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as the basis for inference rather than considering the whole data set. A key question when applying sub-sampling approaches is how to select an informative subset based on the questions being asked of the data. A recent approach for this has been proposed based on determining sub-sampling probabilities for each data point, but a limitation of this approach is that appropriate sub-sampling probabilities rely on an assumed model for the Big data. In this article, to overcome this limitation, we propose a model robust approach where a set of models is considered, and the sub-sampling probabilities are evaluated based on the weighted average of probabilities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Bayesian Modeling and Causal Inference · Statistical Methods and Inference
