Online Reinforcement Learning in Periodic MDP

Ayush Aniket; Arpan Chattopadhyay

arXiv:2303.09629·cs.LG·March 20, 2023·1 cites

Online Reinforcement Learning in Periodic MDP

Ayush Aniket, Arpan Chattopadhyay

PDF

Open Access

TL;DR

This paper addresses learning in periodic Markov Decision Processes by proposing algorithms that adapt to periodicity, achieving regret bounds that depend on the period and horizon, with improved performance when transition sparsity is exploited.

Contribution

The paper introduces PUCRL2 and PUCRLB algorithms for periodic MDPs, with regret bounds depending on the period and horizon, and extends to unknown periods with U-PUCRL2 and U-PUCRLB.

Findings

01

PUCRL2 has regret linear in period N and sublinear in T.

02

PUCRLB improves regret to depend on sqrt(N).

03

Algorithms perform well in numerical experiments.

Abstract

We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period $N$ and as $O (T l o g T)$ with the horizon length $T$ . Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm PUCRLB which enhances upon PUCRL2, both in terms of regret ( $O (N)$ dependency on period) and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural and Behavioral Psychology Studies