Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning
Zhishuai Liu, Weixin Wang, Pan Xu

TL;DR
This paper introduces a new algorithm for distributionally robust off-dynamics reinforcement learning that achieves near-optimal performance bounds and significantly improves computational efficiency compared to previous methods.
Contribution
The paper proposes We-DRIVE-U, a novel algorithm with improved theoretical guarantees and reduced computational complexity for robust RL under uncertain transition dynamics.
Findings
Achieves near-optimal suboptimality bounds up to b1A9(\u00d7)
Constructs a hard instance and derives a lower bound, showing near-optimality
Reduces policy switch and oracle call complexities from b1A0(K) to b1A0(dH log(K))
Abstract
We study off-dynamics Reinforcement Learning (RL), where the policy training and deployment environments are different. To deal with this environmental perturbation, we focus on learning policies robust to uncertainties in transition dynamics under the framework of distributionally robust Markov decision processes (DRMDPs), where the nominal and perturbed dynamics are linear Markov Decision Processes. We propose a novel algorithm We-DRIVE-U that enjoys an average suboptimality , where is the number of episodes, is the horizon length, is the feature dimension and is the uncertainty level. This result improves the state-of-the-art by . We also construct a novel hard instance and derive the first information-theoretic lower bound in this setting, which indicates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Reinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety
MethodsFocus
