Optimal Data Collection for Randomized Control Trials

Pedro Carneiro; Sokbae Lee; Daniel Wilhelm

arXiv:1603.03675·stat.ME·September 27, 2017

Optimal Data Collection for Randomized Control Trials

Pedro Carneiro, Sokbae Lee, Daniel Wilhelm

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to optimize data collection in randomized control trials by using pre-experimental data to select sample size and covariates, reducing costs and improving estimator precision.

Contribution

It proposes a simple, tuning-free algorithm that leverages pre-experimental data to minimize mean squared error under budget constraints in RCTs.

Findings

01

Up to 58% reduction in data collection costs.

02

Significant improvements in treatment effect estimator precision.

03

Applicable to large sets of potential covariates.

Abstract

In a randomized control trial, the precision of an average treatment effect estimator can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our procedure seeks to minimize the resulting average treatment effect estimator's mean squared error, subject to the researcher's budget constraint. We rely on a modification of an orthogonal greedy algorithm that is conceptually simple and easy to implement in the presence of a large number of potential covariates, and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to substantial gains of up to 58%, measured either in terms of reductions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielwilhelm/Matlab-data-coll
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods and Bayesian Inference