Struggles with Survey Weighting and Regression Modeling
Andrew Gelman

TL;DR
This paper discusses the complexities of Bayesian modeling for survey data, emphasizing the importance of including all relevant variables for accurate inference, and highlights ongoing challenges in developing practical multilevel models.
Contribution
It explores the difficulties in constructing comprehensive Bayesian models for survey responses and suggests directions for future research to address these challenges.
Findings
Models become very complex with many poststratification cells
Including all relevant variables is crucial for valid inferences
Open-ended discussion on potential research pathways
Abstract
The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
