Fundamental Limits of Reinforcement Learning in Environment with   Endogeneous and Exogeneous Uncertainty

Rongpeng Li

arXiv:2106.08477·cs.LG·June 17, 2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Rongpeng Li

PDF

Open Access

TL;DR

This paper introduces a variation-aware reinforcement learning algorithm for Markov decision processes with both endogenous and exogenous uncertainties, providing new regret bounds under dynamic variation constraints.

Contribution

It develops a novel VB-UCRL algorithm that handles exogenous uncertainty and establishes improved regret bounds compared to existing methods.

Findings

01

Regret bound of at most 0 or S^{1/6}T^{1/12}

02

Successfully addresses challenges from exogenous uncertainty

03

Provides theoretical guarantees for RL in uncertain environments

Abstract

Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty, where both the rewards and state transition probability are unknown to the RL agent and evolve with the time as long as their respective variations do not exceed certain dynamic budget (i.e., upper bound). We first develop a variation-aware Bernstein-based upper confidence reinforcement learning (VB-UCRL), which we allow to restart according to a schedule dependent on the variations. We successfully overcome the challenges due to the exogeneous uncertainty and establish a regret bound of saving at most $S$ or $S^{\frac{1}{6}} T^{\frac{1}{12}}$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research

Methodstravel james