Selective Uncertainty Propagation in Offline RL

Sanath Kumar Krishnamurthy; Tanmay Gangwani; Sumeet Katariya,; Branislav Kveton; Shrey Modi; Anshuka Rangi

arXiv:2302.00284·cs.LG·January 22, 2025

Selective Uncertainty Propagation in Offline RL

Sanath Kumar Krishnamurthy, Tanmay Gangwani, Sumeet Katariya,, Branislav Kveton, Shrey Modi, Anshuka Rangi

PDF

Open Access 1 Video

TL;DR

This paper introduces a flexible method for offline reinforcement learning that adaptively manages distributional shift challenges during policy evaluation, improving offline policy learning.

Contribution

It proposes selective uncertainty propagation, a novel approach for confidence interval construction that adapts to the difficulty of distribution shift in offline RL.

Findings

01

Demonstrates benefits on toy environments

02

Improves offline policy learning performance

03

Adapts to varying distribution shift challenges

Abstract

We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn this, it is sufficient to evaluate the treatment effect of deviating from the behavioral policy at step h after having optimized the policy for all future steps. Since the policy at any step can affect next-state distributions, the related distributional shift challenges can make this problem far more statistically hard than estimating such treatment effects in the stochastic contextual bandit setting. However, the hardness of many real-world RL instances lies between the two regimes. We develop a flexible and general method called selective uncertainty propagation for confidence interval construction that adapts to the hardness of the associated distribution shift challenges. We show benefits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Selective Uncertainty Propagation in Offline RL· underline

Taxonomy

TopicsDigital Filter Design and Implementation