Panel Data with Unknown Clusters

Yong Cai

arXiv:2106.05503·econ.EM·January 14, 2022

Panel Data with Unknown Clusters

Yong Cai

PDF

Open Access

TL;DR

This paper introduces a method to discover unknown clusters in panel data by thresholding a long-run variance-covariance matrix, enabling more flexible inference without prior cluster knowledge.

Contribution

The proposed procedure automatically identifies clusters in panel data, relaxing the need for pre-specified cluster structures and improving inference accuracy.

Findings

01

Recovers true clusters with high probability

02

Controls size and maintains good power in tests

03

Applicable to large panel datasets with unknown clustering

Abstract

Clustered standard errors and approximate randomization tests are popular inference methods that allow for dependence within observations. However, they require researchers to know the cluster structure ex ante. We propose a procedure to help researchers discover clusters in panel data. Our method is based on thresholding an estimated long-run variance-covariance matrix and requires the panel to be large in the time dimension, but imposes no lower bound on the number of units. We show that our procedure recovers the true clusters with high probability with no assumptions on the cluster structure. The estimated clusters are independently of interest, but they can also be used in the approximate randomization tests or with conventional cluster-robust covariance estimators. The resulting procedures control size and have good power.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpatial and Panel Data Analysis · Statistical Methods and Bayesian Inference · Economic and Environmental Valuation

MethodsLinear Regression