Large Scale Longitudinal Experiments: Estimation and Inference
Apoorva Lal, Alexander Fischer, Matthew Wardrop

TL;DR
This paper introduces a computationally efficient method for large-scale panel regressions using Mundlak's insight, enabling precise estimates with big datasets through novel weighted-least squares techniques and software tools.
Contribution
It develops a new approach to handle nuisance parameters in large panel data, improving computational efficiency and estimation accuracy, with implementations in Python libraries.
Findings
Methods yield more precise estimates than existing estimators.
Compression strategy significantly increases computational efficiency.
Approach is scalable to datasets with millions of observations.
Abstract
Large-scale randomized experiments are seldom analyzed using panel regression methods because of computational challenges arising from the presence of millions of nuisance parameters. We leverage Mundlak's insight that unit intercepts can be eliminated by using carefully chosen averages of the regressors to rewrite several common estimators in a form that is amenable to weighted-least squares estimation with frequency weights. This renders regressions involving arbitrary strata intercepts tractable with very large datasets, optionally with the key compression step computed out-of-memory in SQL. We demonstrate that these methods yield more precise estimates than other commonly used estimators, and also find that the compression strategy greatly increases computational efficiency. We provide in-memory (pyfixest) and out-of-memory (duckreg) python libraries to implement these estimators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimal Experimental Design Methods · Advanced Causal Inference Techniques · Statistical Methods in Clinical Trials
