Application of generalized linear models in big data: a divide and   recombine (D&R) approach

Md. Mahadi Hassan Nayem; Soma Chowdhury Biswas

arXiv:2412.05018·stat.ME·December 12, 2024

Application of generalized linear models in big data: a divide and recombine (D&R) approach

Md. Mahadi Hassan Nayem, Soma Chowdhury Biswas

PDF

Open Access

TL;DR

This paper reviews divide and recombine strategies for fitting generalized linear models to large datasets, proposing a new method for estimating standard errors that achieves efficiency comparable to full data analysis.

Contribution

It introduces a novel sequential partitioning method and a practical approach for estimating standard errors in D&R strategies for GLMs.

Findings

01

The proposed standard error estimation method is theoretically justified.

02

D&R estimators with the new standard error are as efficient as full data estimates.

03

Validation on synthetic data confirms accuracy and consistency with existing R packages.

Abstract

D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the results from each subset are combined to yield the results for the entire dataset. D&R strategies can be implemented to fit GLMs to datasets too large for conventional methods. Several D&R strategies are available for different GLMs, some of which are theoretically justified but lack practical validation. A significant limitation is the theoretical and practical justification for estimating combined standard errors and confidence intervals. This paper reviews D&R strategies for GLMs and proposes a method to determine the combined standard error for D&R-based estimators. In addition to the traditional dataset division procedures, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques · Neural Networks and Applications