Randomization Tests for Distributions of Individual Treatment Effects via Combined Rank Statistics
David Kim, Yongchang Su, Jake Bowers, Xinran Li

TL;DR
This paper introduces adaptive methods for testing the distribution of individual treatment effects in randomized experiments, combining multiple rank-based statistics to improve power without prior knowledge of the best test.
Contribution
It develops inference procedures that adaptively combine rank-based tests and weighting schemes for stratified experiments, maintaining validity and enhancing power.
Findings
Combined test achieves power comparable or superior to the best individual test.
Application to a teacher training program shows roughly half of teachers benefited.
Choice of test influences whether the program appears broadly successful or narrowly effective.
Abstract
What proportion of treated units actually benefited from an experimental intervention? What is the median or the largest individual treatment effect? This paper develops methods for answering such questions about the distribution of individual causal effects in randomized experiments. Existing approaches require the analyst to select a rank-based test statistic before observing the data. A poor choice can substantially reduce power, while searching over multiple test statistics and adjusting for multiplicity using Bonferroni correction also incurs power loss. We propose inference procedures that adaptively combine multiple rank-based statistics while maintaining finite-sample validity. For stratified experiments, we further develop weighting schemes that effectively aggregate evidence across strata of heterogeneous sizes. The resulting combined test achieves power comparable to, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
