Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data
Polly Phipps, Daniell Toth

TL;DR
This paper employs an interpretable regression tree model to analyze establishment nonresponse in surveys, linking administrative data to identify characteristics associated with response propensity and potential nonresponse bias.
Contribution
It introduces a regression tree approach for modeling nonresponse, providing interpretable insights and comparing its effectiveness to logistic regression using linked administrative data.
Findings
Regression tree accurately models response propensity.
Nonresponse bias may be nonignorable without proper adjustments.
Model outperforms logistic regression in interpretability and accuracy.
Abstract
To gain insight into how characteristics of an establishment are associated with nonresponse, a recursive partitioning algorithm is applied to the Occupational Employment Statistics May 2006 survey data to build a regression tree. The tree models an establishment's propensity to respond to the survey given certain establishment characteristics. It provides mutually exclusive cells based on the characteristics with homogeneous response propensities. This makes it easy to identify interpretable associations between the characteristic variables and an establishment's propensity to respond, something not easily done using a logistic regression propensity model. We test the model obtained using the May data against data from the November 2006 Occupational Employment Statistics survey. Testing the model on a disjoint set of establishment data with a very large sample size offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
