
TL;DR
This paper introduces a method to discover unknown clusters in panel data by thresholding a long-run variance-covariance matrix, enabling more flexible inference without prior cluster knowledge.
Contribution
The proposed procedure automatically identifies clusters in panel data, relaxing the need for pre-specified cluster structures and improving inference accuracy.
Findings
Recovers true clusters with high probability
Controls size and maintains good power in tests
Applicable to large panel datasets with unknown clustering
Abstract
Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to help researchers discover clusters in panel data. Our method is based on thresholding an estimated long-run variance-covariance matrix and requires the panel to be large in the time dimension, but imposes no lower bound on the number of units. We show that our procedure recovers the true clusters with high probability with no assumptions on the cluster structure. The estimated clusters are independently of interest, but they can also be used in the approximate randomization tests or with conventional cluster-robust covariance estimators. The resulting procedures control size and have good power.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Statistical Methods and Bayesian Inference · Economic and Environmental Valuation
MethodsLinear Regression
