Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel
Barbara Felderer, Jannis Kueck, Martin Spindler

TL;DR
This paper introduces the double machine learning approach to address nonresponse bias in high-dimensional survey data, enabling more accurate causal inference in digital survey research.
Contribution
It applies double machine learning to survey statistics, providing a novel method for causal analysis of nonresponse in high-dimensional online panel data.
Findings
Double machine learning yields less biased causal estimates.
Method effectively handles high-dimensional survey data.
Improves understanding of nonresponse mechanisms.
Abstract
Survey scientists increasingly face the problem of high-dimensionality in their research as digitization makes it much easier to construct high-dimensional (or "big") data sets through tools such as online surveys and mobile applications. Machine learning methods are able to handle such data, and they have been successfully applied to solve \emph{predictive} problems. However, in many situations, survey statisticians want to learn about \emph{causal} relationships to draw conclusions and be able to transfer the findings of one survey to another. Standard machine learning methods provide biased estimates of such relationships. We introduce into survey statistics the double machine learning approach, which gives approximately unbiased estimators of causal parameters, and show how it can be used to analyze survey nonresponse in a high-dimensional panel setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Methodology and Nonresponse · Human Mobility and Location-Based Analysis · Data-Driven Disease Surveillance
