Relax but stay in control: from value to algorithms for online Markov decision processes
Peng Guan, Maxim Raginsky, Rebecca Willett

TL;DR
This paper develops a general framework for creating algorithms in online Markov decision processes with arbitrarily changing costs, unifying existing methods and introducing new approaches with improved performance.
Contribution
It extends previous ideas to provide a comprehensive framework for algorithm design in non-stationary MDPs with changing costs, facilitating both analysis and computational efficiency.
Findings
Introduces a unifying framework for online MDP algorithms.
Develops new algorithms with advantages over existing methods.
Demonstrates improved performance through an online approximate dynamic programming approach.
Abstract
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. State-based models are common in stochastic control settings, but commonly used frameworks such as Markov Decision Processes (MDPs) assume a known stationary environment. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting in which the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would develop an algorithm almost from scratch. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
