GEE analysis of clustered binary data with diverging number of covariates
Lan Wang

TL;DR
This paper develops an asymptotic theory for GEE analysis of clustered binary data with a diverging number of covariates, ensuring valid inference even with model misspecification.
Contribution
It extends existing GEE theory to high-dimensional covariates, establishing consistency, normality, and validity of variance estimates in large p scenarios.
Findings
GEE estimators are consistent and asymptotically normal with diverging p.
Sandwich variance formula remains valid under model misspecification.
Numerical simulations confirm the accuracy of the asymptotic approximations.
Abstract
Clustered binary data with a large number of covariates have become increasingly common in many scientific disciplines. This paper develops an asymptotic theory for generalized estimating equations (GEE) analysis of clustered binary data when the number of covariates grows to infinity with the number of clusters. In this "large , diverging " framework, we provide appropriate regularity conditions and establish the existence, consistency and asymptotic normality of the GEE estimator. Furthermore, we prove that the sandwich variance formula remains valid. Even when the working correlation matrix is misspecified, the use of the sandwich variance formula leads to an asymptotically valid confidence interval and Wald test for an estimable linear combination of the unknown parameters. The accuracy of the asymptotic approximation is examined via numerical simulations. We also discuss the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
