On the Lower Confidence Band for the Optimal Welfare in Policy Learning
Kirill Ponomarev, Vira Semenova

TL;DR
This paper develops methods for constructing reliable lower confidence bands for optimal welfare in policy learning, especially when standard methods fail due to the treatment policy's proximity to others, with empirical validation on a real study.
Contribution
It introduces a novel approach for constructing lower confidence bands using moment-inequality tests, applicable even when standard debiased methods are invalid.
Findings
Proposed LCBs achieve reliable coverage in empirical tests.
Standard methods can be dominated by alternative LCBs under certain conditions.
The approach is applicable when the optimal policy is not well-separated from others.
Abstract
We study inference on the optimal welfare in a policy learning problem and propose reporting a lower confidence band (LCB). A natural approach to constructing an LCB is to invert a one-sided t-test based on an efficient estimator for the optimal welfare. However, we show that for an empirically relevant class of DGPs, such an LCB can be first-order dominated by an LCB based on a welfare estimate for a suitable suboptimal treatment policy. We show that such first-order dominance is possible if and only if the optimal treatment policy is not ``well-separated'' from the rest, in the sense of the commonly imposed margin condition. When this condition fails, standard debiased inference methods are not applicable. We show that uniformly valid and easy-to-compute LCBs can be constructed analytically by inverting moment-inequality tests with the maximum and quasi-likelihood-ratio test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications
