Endogenous post-stratification in surveys: classifying with a sample-fitted model
F. Jay Breidt, Jean D. Opsomer

TL;DR
This paper develops a theoretical framework for endogenous post-stratification in surveys, where categories are derived from sample-fitted models, and demonstrates its consistency and minimal practical impact through simulations.
Contribution
It introduces properties and consistency results for endogenous post-stratification using sample-fitted models, extending traditional methods to model-based classification.
Findings
Estimator is design consistent under mild conditions.
Estimator has the same asymptotic variance as traditional post-stratification.
Simulation shows small practical effect of model fitting before post-stratifying.
Abstract
Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data (``endogenous post-stratification'') violates the standard post-stratification assumptions that observations are classified without error into post-strata, and post-stratum population counts are known. Properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model, from which the post-strata are constructed by dividing the range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
