Upper and Lower Bounds for Distributionally Robust Off-Dynamics   Reinforcement Learning

Zhishuai Liu; Weixin Wang; Pan Xu

arXiv:2409.20521·cs.LG·October 1, 2024

Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning

Zhishuai Liu, Weixin Wang, Pan Xu

PDF

Open Access

TL;DR

This paper introduces a new algorithm for distributionally robust off-dynamics reinforcement learning that achieves near-optimal performance bounds and significantly improves computational efficiency compared to previous methods.

Contribution

The paper proposes We-DRIVE-U, a novel algorithm with improved theoretical guarantees and reduced computational complexity for robust RL under uncertain transition dynamics.

Findings

01

Achieves near-optimal suboptimality bounds up to b1A9(\u00d7)

02

Constructs a hard instance and derives a lower bound, showing near-optimality

03

Reduces policy switch and oracle call complexities from b1A0(K) to b1A0(dH log(K))

Abstract

We study off-dynamics Reinforcement Learning (RL), where the policy training and deployment environments are different. To deal with this environmental perturbation, we focus on learning policies robust to uncertainties in transition dynamics under the framework of distributionally robust Markov decision processes (DRMDPs), where the nominal and perturbed dynamics are linear Markov Decision Processes. We propose a novel algorithm We-DRIVE-U that enjoys an average suboptimality $O (d H \cdot min {1/ ρ, H} / K)$ , where $K$ is the number of episodes, $H$ is the horizon length, $d$ is the feature dimension and $ρ$ is the uncertainty level. This result improves the state-of-the-art by $O (d H / min {1/ ρ, H})$ . We also construct a novel hard instance and derive the first information-theoretic lower bound in this setting, which indicates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Reinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety

MethodsFocus