Automated Selection of Post-Strata using a Model-Assisted Regression Tree Estimator
Kelly S. McConville, Daniell Toth

TL;DR
This paper introduces a regression tree estimator for survey data that automatically selects post-strata, improving efficiency and interpretability compared to traditional methods, and demonstrates its consistency and performance with US labor statistics.
Contribution
It proposes a novel model-assisted regression tree estimator that automates post-stratum selection, addressing limitations of traditional models in survey estimation.
Findings
Estimator is consistent under certain conditions.
Performance surpasses traditional estimators in simulations.
Automatically selects meaningful post-strata using recursive partitioning.
Abstract
Auxiliary information can increase the efficiency of survey estimators through an assisting model when the model captures some of the relationship between the auxiliary data and the study variables. Despite their superior properties, model-assisted estimators are rarely used in anything but their simplest form by statistical agencies to produce official statistics. This is due to the fact that the more complicated models that have been used in model-assisted estimation are often ill suited to the available auxiliary data. Under a model-assisted framework, we propose a regression tree estimator for a finite population total. Regression tree models are adept at handling the type of auxiliary data usually available in the sampling frame and provide a model that is easy to explain and justify. The estimator can be viewed as a post-stratification estimator where the post-strata are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
