Discussion of Kallus (2020) and Mo, Qi, and Liu (2020): New Objectives for Policy Learning
Sijia Li, Xiudi Li, Alex Luedtke

TL;DR
This paper analyzes recent innovative objective functions for policy learning, emphasizing the importance of value function curvature and proposing more efficient methods for using calibration data in robust policy optimization.
Contribution
It introduces two methods to incorporate value function curvature in retargeting and improves data utilization for distributionally robust policies.
Findings
Curvature of the value function significantly impacts policy learning.
Proposed methods enhance the efficiency of policy optimization.
Better use of calibration data improves robustness of policies.
Abstract
We discuss the thought-provoking new objective functions for policy learning that were proposed in "More efficient policy learning via optimal retargeting" by Nathan Kallus and "Learning optimal distributionally robust individualized treatment rules" by Weibin Mo, Zhengling Qi, and Yufeng Liu. We show that it is important to take the curvature of the value function into account when working within the retargeting framework, and we introduce two ways to do so. We also describe more efficient approaches for leveraging calibration data when learning distributionally robust policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Health Systems, Economic Evaluations, Quality of Life
