Reclustering: A New Method to Test the Appropriate Level of Clustering
Kentaro Fukumoto

TL;DR
This paper introduces reclustering, a finite-sample method for testing the appropriate clustering level in data, by repeatedly grouping fine clusters into gross clusters and assessing the stability of cluster-sensitive statistics.
Contribution
It proposes a novel reclustering approach that works in finite samples to determine the proper clustering level, addressing limitations of existing asymptotic methods.
Findings
Reclustering effectively identifies the correct clustering level in simulations.
The method performs well compared to previous tests in empirical applications.
Reclustering provides a finite-sample alternative for cluster validity testing.
Abstract
When scholars suspect units are dependent on each other within clusters but independent of each other across clusters, they employ cluster-robust standard errors (CRSEs). Nevertheless, what to cluster over is sometimes unknown. For instance, in the case of cross-sectional survey samples, clusters may be households, municipalities, counties, or states. A few approaches have been proposed, although they are based on asymptotics. I propose a new method to address this issue that works in a finite sample: reclustering. That is, we randomly and repeatedly group fine clusters into new gross clusters and calculate a statistic such as CRSEs. Under the null hypothesis that fine clusters are independent of each other, how they are grouped into gross clusters should not matter for any cluster-sensitive statistic. Thus, if the statistic based on the original clustering is a significant outlier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Advanced Statistical Methods and Models · Census and Population Estimation
