Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing
Justin Whitehouse, Qizhao Chen, Morgane Austern, Vasilis Syrgkanis

TL;DR
This paper introduces a softmax smoothing estimator for optimal policy value inference that handles non-differentiability and non-response issues, applicable in static and dynamic regimes with efficiency benefits.
Contribution
It proposes a simple, efficient, and versatile softmax smoothing approach for inference on optimal policy values, overcoming key limitations of existing methods.
Findings
Estimator is statistically efficient with zero non-response probability.
Requires only a constant number of nuisance model fits.
Applicable to general parameters involving maxima of scores.
Abstract
Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
