Pseudo-clustering for combining data sets with multiple hierarchies

Seho Park; A James O\'Malley

arXiv:2309.13699·stat.ME·September 26, 2023

Pseudo-clustering for combining data sets with multiple hierarchies

Seho Park, A James O\'Malley

PDF

Open Access

TL;DR

This paper introduces a pseudo-clustering method to combine datasets with different hierarchical structures and sampling weights, enabling accurate multi-level modeling of complex survey data.

Contribution

It proposes a novel pseudo-cluster approach for unifying diverse hierarchical survey data, allowing unbiased estimation of model parameters with sampling weights.

Findings

01

Considering sampling weights yields unbiased parameter estimates.

02

The method improves variance component estimation in multi-level models.

03

Simulation studies validate the approach's effectiveness.

Abstract

Multi-level modeling is an important approach for analyzing complex survey data using multi-stage sampling. However, estimation of multi-level models can be challenging when we combine several datasets with distinct hierarchies with sampling weights. This paper presents a method for combining multiple datasets with different hierarchical structures due to distinct informative sampling designs for the same survey. To develop an approach with complete generality, we propose to define a pseudo-cluster, a cluster containing only a singleton observation, to unify the data structure and thereby enable estimation of multi-level models incorporating sampling weights across the combined sample. We justify incorporating sampling weights at each level of the hierarchical model and in doing-so define a pseudo-likelihood estimation procedure. Simulation studies are used to illustrate the effect of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models · demographic modeling and climate adaptation