PCA Rerandomization

Hengtao Zhang; Guosheng Yin; Donald B. Rubin

arXiv:2102.12262·stat.ME·February 25, 2021

PCA Rerandomization

Hengtao Zhang, Guosheng Yin, Donald B. Rubin

PDF

TL;DR

This paper introduces PCA rerandomization, a method that uses principal component analysis to improve covariate balance in high-dimensional experiments, enhancing treatment effect estimation.

Contribution

It proposes a novel PCA-based rerandomization approach that effectively reduces dimensionality and improves covariate balance in high-dimensional settings.

Findings

01

PCA rerandomization balances covariates more effectively in high-dimensional data.

02

The method improves the accuracy of average treatment effect estimation.

03

Numerical studies confirm theoretical advantages with simulated and real data.

Abstract

Mahalanobis distance between treatment group and control group covariate means is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. Here, we propose leveraging principal component analysis (PCA) to identify proper subspaces in which Mahalanobis distance should be calculated. Not only can PCA effectively reduce the dimensionality for high-dimensional cases while capturing most of the information in the covariates, but it also provides computational simplicity by focusing on the top orthogonal components. We show that our PCA rerandomization scheme has desirable theoretical properties on balancing covariates and thereby on improving the estimation of average treatment effects. We also show that this conclusion is supported by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.