Struggles with Survey Weighting and Regression Modeling

Andrew Gelman

arXiv:0710.5005·stat.ME·November 6, 2007

Struggles with Survey Weighting and Regression Modeling

Andrew Gelman

PDF

TL;DR

This paper discusses the complexities of Bayesian modeling for survey data, emphasizing the importance of including all relevant variables for accurate inference, and highlights ongoing challenges in developing practical multilevel models.

Contribution

It explores the difficulties in constructing comprehensive Bayesian models for survey responses and suggests directions for future research to address these challenges.

Findings

01

Models become very complex with many poststratification cells

02

Including all relevant variables is crucial for valid inferences

03

Open-ended discussion on potential research pathways

Abstract

The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.