Policy Evaluation and Optimization with Continuous Treatments
Nathan Kallus, Angela Zhou

TL;DR
This paper extends policy evaluation and learning methods to continuous treatments using kernel functions, enabling effective policy optimization in settings like personalized dosing, and demonstrates superior performance over discretization methods.
Contribution
It introduces a kernel-based extension of IPW and DR methods for continuous treatments, providing a consistent estimator and a policy optimizer with theoretical guarantees.
Findings
The estimator outperforms discretization benchmarks.
The policy optimizer achieves convergent regret.
In a case study, the learned policy outperforms benchmarks.
Abstract
We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Machine Learning in Healthcare · Advanced Bandit Algorithms Research
