A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
Pratik Gajane, Ronald Ortner, and Peter Auer

TL;DR
This paper introduces a sliding-window algorithm for reinforcement learning in non-stationary Markov Decision Processes with changing rewards and transitions, providing theoretical guarantees and experimental validation.
Contribution
It proposes a novel sliding-window approach for non-stationary MDPs, with regret bounds, optimal window size characterization, and sample complexity analysis.
Findings
Regret bounds for the proposed algorithm
Characterization of optimal window size
Experimental results supporting theoretical claims
Abstract
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
