Small area prediction of counts under machine learning-type mixed models
Nicolas Frink, Timo Schmid

TL;DR
This paper introduces novel machine learning-based small area estimation methods for count data, specifically random forest approaches that handle overdispersion and include bootstrap techniques for reliability assessment, validated through simulations and real data.
Contribution
The paper develops two new random forest-based small area estimation methods for count data, addressing overdispersion and providing bootstrap-based reliability measures.
Findings
MERF performs well with severe overdispersion.
GMERF is effective when Poisson assumptions hold.
Bootstrap methods reliably assess estimator uncertainty.
Abstract
This paper proposes small area estimation methods that utilize generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, we present two approaches based on random forests: the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF), both tailored to address challenges associated with count outcomes, particularly overdispersion. Our analysis reveals that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. Additionally, we introduce and evaluate three bootstrap methodologies - one parametric and two non-parametric - designed to assess the reliability of point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
