Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary   Environments

Liyu Chen; Haipeng Luo

arXiv:2205.13044·cs.LG·May 27, 2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

Liyu Chen, Haipeng Luo

PDF

Open Access 1 Video

TL;DR

This paper studies goal-oriented reinforcement learning in changing environments, establishing lower bounds and designing algorithms that adapt to unknown changes to minimize regret effectively.

Contribution

It introduces the first lower bounds for non-stationary goal-oriented RL and develops near-optimal algorithms that adapt to unknown environment changes.

Findings

01

Established a lower bound on dynamic regret in non-stationary environments.

02

Designed algorithms that achieve near-optimal regret bounds.

03

Extended methods to handle unknown change rates in environment.

Abstract

We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions. We start by establishing a lower bound $Ω ((B_{⋆} S A T_{⋆} (Δ_{c} + B_{⋆}^{2} Δ_{P}))^{1/3} K^{2/3})$ , where $B_{⋆}$ is the maximum expected cost of the optimal policy of any episode starting from any state, $T_{⋆}$ is the maximum hitting time of the optimal policy of any episode starting from the initial state, $S A$ is the number of state-action pairs, $Δ_{c}$ and $Δ_{P}$ are the amount of changes of the cost and transition functions respectively, and $K$ is the number of episodes. The different roles of $Δ_{c}$ and $Δ_{P}$ in this lower bound inspire us to design algorithms that estimate costs and transitions separately. Specifically, assuming the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments· slideslive

Taxonomy

TopicsSmart Grid Energy Management · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research