A Sliding-Window Algorithm for Markov Decision Processes with   Arbitrarily Changing Rewards and Transitions

Pratik Gajane; Ronald Ortner; and Peter Auer

arXiv:1805.10066·cs.LG·May 28, 2018·36 cites

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

Pratik Gajane, Ronald Ortner, and Peter Auer

PDF

Open Access

TL;DR

This paper introduces a sliding-window algorithm for reinforcement learning in non-stationary Markov Decision Processes with changing rewards and transitions, providing theoretical guarantees and experimental validation.

Contribution

It proposes a novel sliding-window approach for non-stationary MDPs, with regret bounds, optimal window size characterization, and sample complexity analysis.

Findings

01

Regret bounds for the proposed algorithm

02

Characterization of optimal window size

03

Experimental results supporting theoretical claims

Abstract

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management