Distributionally Robust Policy Learning under Concept Drifts
Jingyuan Wang, Zhimei Ren, Ruohan Zhan, and Zhengyuan Zhou

TL;DR
This paper develops a distributionally robust policy learning framework that specifically addresses concept drift by focusing on changes in the conditional distribution, providing estimators and algorithms with proven optimality and empirical validation.
Contribution
It introduces a doubly-robust estimator for worst-case policy evaluation under concept drift and proposes an optimal policy learning algorithm with theoretical guarantees.
Findings
The estimator is asymptotically normal even with slow nuisance parameter estimation.
The policy learning algorithm achieves a sub-optimality gap of order n^{-1/2}.
Numerical studies show substantial improvements over benchmarks.
Abstract
Distributionally robust policy learning aims to find a policy that performs well under the worst-case distributional shift, and yet most existing methods for robust policy learning consider the worst-case joint distribution of the covariate and the outcome. The joint-modeling strategy can be unnecessarily conservative when we have more information on the source of distributional shifts. This paper studies a more nuanced problem -- robust policy learning under the concept drift, when only the conditional relationship between the outcome and the covariate changes. To this end, we first provide a doubly-robust estimator for evaluating the worst-case average reward of a given policy under a set of perturbed conditional distributions. We show that the policy value estimator enjoys asymptotic normality even if the nuisance parameters are estimated with a slower-than-root- rate. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Stream Mining Techniques · Water resources management and optimization · Advanced Bandit Algorithms Research
MethodsSparse Evolutionary Training
