Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data
Feifei Wang, Danyang Huang, Yingqiu Zhu, Hansheng Wang

TL;DR
This paper introduces a Pseudo-Newton-Raphson method for efficient, statistically sound estimation of generalized linear models in distributed systems with nonrandom data distribution, reducing communication and storage costs.
Contribution
It proposes a novel one-step estimator based on a pilot sample that achieves statistical efficiency and computational savings in distributed, nonrandom data settings.
Findings
Estimator is statistically efficient and computationally scalable.
Method performs well in simulations and real data analysis.
Likelihood ratio test is developed for hypothesis testing.
Abstract
Distributed systems have been widely used in practice to accomplish data analysis tasks of huge scales. In this work, we target on the estimation problem of generalized linear models on a distributed system with nonrandomly distributed data. We develop a Pseudo-Newton-Raphson algorithm for efficient estimation. In this algorithm, we first obtain a pilot estimator based on a small random sample collected from different Workers. Then conduct one-step updating based on the computed derivatives of log-likelihood functions in each Worker at the pilot estimator. The final one-step estimator is proved to be statistically efficient as the global estimator, even with nonrandomly distributed data. In addition, it is computationally efficient, in terms of both communication cost and storage usage. Based on the one-step estimator, we also develop a likelihood ratio test for hypothesis testing. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference
