Externally Valid Selection of Experimental Sites via the k-Median Problem
Jos\'e Luis Montiel Olea, Brenda Prallon, Chen Qiu, J\"org Stoye, and Yiwei Sun

TL;DR
This paper links the problem of selecting experimental sites for external validity to the k-median problem, providing a decision-theoretic foundation and practical algorithms for optimal site selection.
Contribution
It introduces a decision-theoretic approach to site selection using the k-median problem, with conditions for optimality and empirical applications demonstrating its effectiveness.
Findings
Minimizing worst-case regret aligns with finding central site covariates.
The k-median formulation can be solved via linear integer programming.
Empirical applications show practical benefits of the method.
Abstract
We present a decision-theoretic justification for viewing the question of how to best choose where to experiment in order to optimize external validity as a -median problem, a popular problem in computer science and operations research. We present conditions under which minimizing the worst-case, welfare-based regret among all nonrandom schemes that select sites to experiment is approximately equal - and sometimes exactly equal - to finding the k most central vectors of baseline site-level covariates. The k-median problem can be formulated as a linear integer program. Two empirical applications illustrate the theoretical and computational benefits of the suggested procedure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advanced Statistical Process Monitoring · Fault Detection and Control Systems
